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chapter I 


STATISTICS AS A SCIENCE: AXIOMS OF 
PROBABILITY 

1. Introductory, The word “ statistics ” is defined in 
the Concise Oxford Dictionary as follows : in the plural, 
numerical facts systematically collected, as statistics of 
poprdation, crime ” ; in the singular, “ science of collecting, 
classifying andusing statistics.” This definition adequately 
conveys the present meaning of the word ; but the term 
was once restricted, as its derivation shows, to systematic 
collections of data descriptive of political communities, a 
domain partly taken over now by the more special word 
‘ demography.” 

The word statistics (in the plural) is used nowadays 
to characterize “ numerical facts systematically collected ” 
in any field whatever of observation or experiment. 
The technique of collecting data and the principles 
to be heeded in order to avoid bias in the interpretation 
are described at length and exemplified in chapters of more 
extensive treatises which the reader may consult. He may 
also form a general idea of practical details by studying 
the prefatory description of method in some actual published 
investigation, for example into housing and economic 
conditions in a particular town or area. In any case the 
principles to be observed in arranging a statistical investi- 
gation can be thoroughly grasped only when the analysis 
used to interpret the data is well understood ; and this 

involves a knowledge of the science of statistics (in the 
singular). 

The intermediate stage of tabulation, by which collected 

A 
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data are set out in the most perspicuous form for analysis 
or inspection with a particular aim, is also usually the 
subject of a chapter, with illustrative examples and 
criticisms, in larger treatises than the present one. Here 
again the reader may learn much from the attentive 
perusal of statistical year-books and similar publications, 
and from the results tabulated in other published investiga- 
tions, The principles are those of logical classification of 
different categories ; and the art of tabulation rests in 
making the relation of the categories and the numbers 
in various categories as clear as possible to the eye yet 
compact on the printed page. Thus one may have 
statistics of employed persons according to age, sex, 
district, trade and wage ; how can the respective numbers 
best be set out in one or more tables with rows and columns, 
row-totals, column-totals, sub-totals and grand totals? 
This is a t3rpical problem of tabulation, and the chief aids 
towards resolving it rest on experience and common sense. 

Statistics involves classification by number in categories. 
Let us note for further reference the possible relations of 
individuals in two categories A and B. It may be that an 
individual of the collection cannot be both A and B at the 
same time ; for example if a coin falls heads,’’ it certainly 
has not fallen “tails.” The categories A and B are then 
mutually exclusive ; their relation is that of “ either . . . 
or.” On the other hand, the categories A and B may be 
of such a kind that an individual may belong to both at 
the same time ; the relation of such categories is that of 
“ both , , . and.” ■ . 

2. Statistics as a Science. The concern of the 
present book will for the most part he with statistics (in 
the singular) as a science. The typical order of develop- 
ment of the “ exact ” sciences (as they are somewhat 
loosely called) has been along the following lines. First 
of all, the examination of data collected in a particular 
field of inquiry is found to disclose elements of regularity. 
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suggesting a law or laws. This is the stage of inductive 
synthesis. These laws are expressed, if possible, in the 
form of logical or numejrical axioms, resembling those 
of Euclidean geometry. The methods of logic and 
mathematics are then brought into play to deyelop the 
consequences of the axioms, producing an assemblage of 
theorems or propositions. This department of the science, 
namely the posing of axioms and the deduction of theorems, 
is usually called the pure branch of the science. Eyen if 
future observations should invalidate the axioms extrinsi- 
cally, the discrepancies between theory and fact being 
too great to be explained away, these axioms and the 
deductions based on them would still have an abstract 
validity, as a logical structure of propositions exempt 
from self-contradiction ; but for the description and 
explanation of the phenomena a new set of axioms would 
have to be found. On the other side, the corroborative 
part of the science consists in interpreting the abstract 
functions, formulae, equations, constants, invariants and 
the like, which occur in the pure formulation, as measures 
and measurable relations of actual phenomena, or numbers 
constructed from those measures in a definite way. This 
interpretative aspect constitutes the applied branch of the 
science. 

Such a division or dichotomy into pure and applied can 
be recognized in almost any science. A good example is 
Newtonian dynamics, according to which the motions of aU 
bodies in the universe were presumed to obey certain axioms 
and postulates, namely Newton’s laws of force and motion 
and the law of gravitation. Later experiments, more 
numerous, more delicate, more comprehensive, suggested 
that this formulation, though describing almost all observed 
dynamical phenomena with a precision unprecedented in 
history, did not sufficiently account for certain exceptional 
facts, such as the precession of the perihelion of Mercury. 
The discrepancies between prediction and actuality were 
extraordinarily small, but they were persistent. There thus 
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arose a theory, or rather a succession of supplementary theories, 
of relativity, formulated on a new axiomatic basis by which 
the discrepancies of the earlier one might be reconciled, or 
removed. This reformulation of hypotheses still proceeds, is 
still incomplete, and undergoes modification from time to time. 

What is the' axiomatic basis of the science of statistics, 
and what are the facts upon which the inductive synthesis 
is based ? The facts are certain regularities which have 
been observed in the proportionate frequency with which 
certain simple events happen or do not happen, when the 
circumstances under which they may occur are reconstructed 
again and again in repeated trials ; and the axioms, and 
the structure of theorems founded upon them, constitute 
the subject called 'mathematical probability. As for the 
facts, anyone who is interested can collect a few for him- 
self. Spin an ordinary coin a large number of times, 
and one can hardly fail to notice that the proportions 
of heads and of tails are very nearly equal ; or shake 
a well-made die repeatedly from a dice-box and one will 
find that after many trials each face of the die has turned 
up in about one-sixth of the total number of trials. 

Example. The reader is recommended to experiment 
with simple repeated trials of this kind, and for future 
reference to record the results in sequence, in the order in 
which they occur. For example, the record of spins of a 
coin might be 

00101 OHIO 01101 00001 10111 ... 
or the like, where “ 1 ” denotes ‘‘ heads,” and “ 0 ” “ tails.” 

It is instinctive to look for some cause for this 
approximate equality of frequency in heads and tails, 
and natural to locate this cause as somehow resident in 
the two-sided nature and appreciable symmetry of the 
coin ; or to ascribe the approximate equality of frequency 
of the faces of the die to its six-sided and nearly uniform 
configuration. Simple ideas such as these suggest by 
generalisation and abstraction the axioms of probability ; 
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but the choice of axioms may be made in various ways, 
which lead to different formulations of the theory of 
probability. 

3. Survey of Various Definitions of Probability. 

No single particular definition of probability has so far 
met with predominating acceptance. The requisites of a 
satisfactory basis would be these : breadth of application, 
sufficient closeness to the intuitions in which the concept 
originates, and freedom from excessive complexity or 
abstruseness. No theory as yet proposed has been able 
to make these requisites compatible. We may survey 
some contrasting standpoints. 

Probability as the Logic of Uncertain Inference. 
One view is that probability may be regarded as a kind 
of extension of classical logic, an extension conveniently 
described as the “logic of uncertain inference.” This 
view has been expounded by J. M. Keynes in A Treatise 
on Probability (London, 1921), especially in Part II, 
Chapters X-XVII, where references to earlier expositions 
are given. Probabihty is here regarded as “ the degree 
of our rational belief ” in the truth of a given proposition, 
such belief being contingent on a body of relevant know- 
ledge. A logical algebra is developed, but the theorems 
are stated in symbolic, not in numerical or metrical terms, 
and can be applied to the objective problems of statistics 
only by an abrupt and dubious transition from the symbolic 
to the metrical. 

Probability a Priori, and Probability as Relative 
Frequency. As our simple illustrations of the coin and 
the die have suggested, the crude intuition of probability 
rests on the observation that when a given set of circum- 
stances S, such as a symmetrical coin spun rapidly, has 
been present on numerous occasions in the past, it has 
been associated in a nearly constant proportion of those 
occasions with some event E, such as the fall of “ heads.” 

The d, priorist theory directs attention to the set of 
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circumstances S, or rather to the invariant part of S. 
In many spins of a coin or die something remains unchanged, 
namely those properties which describe the coin or die 
as a rigid constant configuration. The a priorist will 
regard the probabilities of falls 1, 2, 3, 4, 5, 6 of a die 
as some part of the description of the die, as measuring 
indeed some quality resident in the structure of the die, 
before any spinning is performed. Now the classical 
d priori definition took account only of a very limited 
class of systems ” /S', namely those possessing symmetry, 
in the sense that the different aspects (such as faces 1, 2, 
3, 4, 5, 6 of the die) were presumed physically indistin- 
guishable. Such an assumption is an idealization of the 
facts, for we can never hope to test completely the 
symmetry of any actual coin or die ; not only would 
the tests be infinitely many and impossibly delicate, but 
the concept of the rigidity and permanence in time of a 
material body is not sustained by modern physics. How- 
ever, symmetry being presumed, the six faces 1, 2, 3, 4, 
5, 6 were characterized as equally likely ” to be found 
uppermost after any throw, and the probability of 1/6 
was attributed to each of these “ events.’’ More generally, 
if n equally likely aspects of a proposed system S were 
discriminated, m of these being favourable to the event U, 
the probability of U with respect to S was deJ&ned as 
p{E ; S) = min. 

Criticism is easy. The logician will not fail to pounce 
upon the words ‘‘ equally likely,” pointing out that they are 
synonymous with ‘‘ equally probable,” and that therefore 
probability is being defined by what is probable, a circulus 
in definiendo being thus committed. Postponing the 
defence, we may pass on to inquire what could be the 
definition of probability, should the tests have disclosed 
asymmetry in S. The inquiry is most pertinent, for the 
heterogeneous and the asymmetrical are the prevalent 
order of nature, the homogeneous and the symmetrical 
being the exception. One has no difSculty for example in 
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conceiving a die which might be an irregular hexahedron, 
heterogeneous m density and with non-parallel and unequal 
opposite edges and faces. Such dice, and more complicated 
asymmetrical systems, have been subjected to repeated 
trials, which have shown a tendency of relative frequency 
of falls towards a constancy resembling that observed in 
symmetrical systems. 

Stability of Relative Freqpiency- Another view 
from the angle of common sense,” in some respects 
antithetical to the view just mentioned, is the frequency 
view. Here the invariahihty of the configurative part 
of Sj whether symmetrical or unsymmetrical, is tacitly 
assumed, and attention is concentrated upon the sequence 
of trials, and the incidence of jE? in these. For example, 
the die is thrown again and again. When E occurs, let 
us write 1 ; when E does not occur, let us write 0. A 
succession of n trials then gives a sequence 

A = ( 1 ) 

each being 1 or 0. 

Let m be the number of I’s in this sequence. A very 
limited experience, such as spinning a coin or die 10 times 
on several occasions, will show that in a finite number n 
of trials made upon the same system 8 on two or more 
occasions, different values of m are not only possible but 
usual. Thus, if E is the throw of an ace with a single 
die, 100 throws may on one occasion give m = 15 and 
on another occasion give m = 20. It follows that m order 
to define a probability p(E ; 8) which shall be unique and 
not discordant with experience, we must idealize once 
again, postulating a limiting process as n tends to infinity 
and writing 

lim mjn = p{E ; /S). . . . (2) 

n~^co 

This is in fact a definition, supported by a certain school 
of statisticians, based upon the limit of frequency ratio 
or relative frequency mjn. Though at first sight attractive, 
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it fades a little on scrutiny. Granted tlie postulate of this 
limit p for one sequence of trials upon 8, can we accept 
the more stringent postulate that the same limiting value 
p is obtained for any other infinite sequence of trials on 
8 ? Not without further assumptions, for one might 
imagine a mechanism sufficiently delicate to throw heads 
with a coin, or an ace with a die, on almost all occasions. 
There is therefore some restriction on the manner , of 
throwing, or on the initial state of 8. This restriction 
is usually stated in the form of a condition that successive 
throws must be random,” but this merely transfers the 
burden of explanation to a new and undefined concept, 
randomness.” To discuss various attempts to define 
randomness would take us too far afield. It is easy to 
say that randomness is absence of any law ; but what is 
law ” in this connexion ? 

Another difficulty is that the tendency of relative 
frequency m/n towards a limit p is different in nature 
from the corresponding tendency to a limit which mathe- 
maticians have discerned and used in the infinite sequences 
of mathematical analysis. To take a classical example, 
in the sequence defining a certain simple geometric series, 

1, 1-^il-i+J, (3) 

the deviations of the successive terms from f are respectively 
h ~~'h Atj • • • j being numerically half its pre- 
decessor, so that, given a small number e, such as 1/1000000, 
we can always find some term sufficiently far along the 
sequence, after and including which all terms deviate from 
I by less than e. Thus | is the limit of this sequence. 
But what can be asserted concerning the sign and magnitude 
of the deviation €n, considered as a function of n, in 

€n = mjn—p{E : 8) ? 

It would seem that the only kind of assertion about 
which would carry conviction would itself involve some- 
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where the notion of probability ; and here the risk of 
committing a circle in definition again raises its head. 

It should be added that the chief defects of the approach 
to probability by limit of frequency ratio have lately been 
removed by the work of de Mses, Copeland, Ddrge, Wald 
and others. These writers admit only certain sequences 
A of suitable postulated properties, including that of 
limiting ratio ; but some logical difficulties remain, and 
the modified formulations lose the primitive simplicity in 
which they originated. 

It would seem, however, that a more natural course, 
and one more in line with the general method of science, 
would be to try to explain the effect, namely the relative 
frequency of E, by an analysis of the cause, namely the 
system B. This suggests a return to the a priori stand- 
point ; and it may be noted that several authors at the 
present time, Fr4chet, Kolmogoroff, Cramer and others, 
have been independently engaged in rehabilitating the 
d priori definition by furnishing it with a better axiomatic 
basis. 

4. Probability as Measure of a Sub-Aggregate. 

Let us examine more closely the system 8, keeping some 
simple system such as a coin or die in mind. The 
approximately constant element in our sequences A, 
namely the almost stable frequency ratio of E, must 
reflect — at least so our intuition suggests — ^the constant 
element of 8, such as the rigid configuration of a coin 
or die ; the irregularity which we name randomness 
doubtless reflects the variable part of 8, such as the 
initial position, velocity and angular velocity of projection. 

What is 8 when an unsymmetrical and heterogeneous 
die is spun and falls ? It consists of (i) the die, specified 
as a particular constant rigid body, (ii) the floor or table 
on wMcli it may impinge or finally rest, (iii) the surrounding 
air, and so on ; together with (iv) the circumstances of 
projection, described by coordinates of initial position, 



iO 


STATISTICS AS A SCIENCE 


momentum and angular momentum. The coordinates 
specifying the rigidity of the die and the configuration 
of the table or floor are constant components of JS, the 
other initial coordinates of S are variable. The set of 
coordinates of S at the instant of projection may be 
called the initial phase. Each variable coordinate, such 
as the initial position, or the initial momentum, has a 
certain field of variation. Hence we must assume a set^ 
of possible phases which, if they can be enumerated in 
some order, may be designated by 8^* ---j ••• ?* 

and this ensemble of possible initial phases 8^ constitutes 
an aggregate 8 of the kind specially studied in pure 
mathematics.* If dynamical determinism be assumed, 
but not otherwise, the initial phase will decide whether 
or not the event U will occur. Consequently the possible 
initial phases may be classified as£^-phases or not -.Er -phases 
(let us say J^-phases), so that the whole phase aggregate is 
^vided into two sub-aggregates. Now the question of 
assigning a measure to such aggregates has been deeply 
studied in modem pure mathematics, the guiding idea 
being that of extending as widely as possible the scope 
of a concept familiar in simple cases, namely the cardinal 
number of a finite set of objects, the length of a line, the 
area of a surface, the volume of a solid. If ilf is the 
measure of the whole aggregate 8 of possible phases, and 
pM the measure of the aggregate of ^-phases contained 
in it, then p is the probability p{E ; 8), 

Something has been glossed over here ; there is the 
tacit assumption that the initial phases are equally 
likely.” But let us insist that the question of equal 
likeliness is not one for the abstract formulation at all ; 
for to specify the aggregate is in effect to say that its 
elements, the initial phases, are equally likely. For 
example, if the aggregate were, of points on a continuous 
line segment, and the measure were ordinary length, then 

* We use the same letter S as before, regarding the system 
now as the totality of its possible phases. 
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we have implied in this description that all points in the 
segment are equally likely. On the other hand, the question 
of equal likeliness is crucial in the application to experiment 
or observation, that is, in applied statistics, where a 
wrong choice of the aggregate may alter all the pro- 
babilities. This has long been known in problems of 
so-called geometrical probability. For example, given a 
circle, let a chord be drawn across it at random : what 
is the probability that the length of the chord exceeds 
half the diameter ? It depends entirely on the manner 
in which the chord is drawn. If it is done by taking a 
^point on the circumference and then drawing the chord 
at any angle, all angles being thus supposed equally likely, 
then the probabiHty is 2/3 ; but if it is done by taking 
any diameter and drawing the chord at right angles to 
any point taken in the diameter, the diameters and points 
being equally likely, then the probability is ^3/2. 

The inclusion of the words “ equally likely ” in a definition 
is in fact a concession ; it puts the reader more gently at 
terms with the abstract formulation by anticipating its chief 
future application. The usage is not uncommon. When a 
point is defined as “ that which has position but no magnitude ’* 
the same appeal is made to an application, but the same 
suspicion of a circle in definition is incuiTed, for how can 
position be defined without the notion of a point ? And if a 
straight line is defined as “ lying evenly ” between its extreme 
points, what else does “ evenly ” mean but “ in a straight 
line ” ? Every definition which is not pure abstraction must 
appeal somewhere to intuition or experience by using some 
such verbal counter as “ point,” “ straight line ” or “ equally 
likely,” under the stigma of seeming to commit a circle in 
definition. 

This prologue, though it has omitted many subtler 
points which could be amplified at very great length, 
must now be cut short. To summarize : (i) events E 
are conceived as associated with, or caused by, phases 
Si of circumstances ; (ii) each Si gives rise unambiguously 
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either to E or . to ^ ; (iii) the phases 8i form in their 
totality a set or aggregate 8, of which the phases favourable 
to E, and those favourable to E, form complementary 
subsets ; (iv) a measure M can be given to the whole set 
8, and if is the measure of the subset favourable to E, 
then p is the probability p{E ;S) E with respect to 8 ; 
(v) the question of equal likehness of phases is the same 
as the question of specifying the aggregate and its measure, 
and in practical applications this must be determined by 
the circumstances of the particular problem. Let us 
finally add that the word phase can be extended to include 
coordinates other than djmamical ones ; also that the 
name fundamental probability set ” is used by some 
writers for the set 8 of phases 8^. 

5. Definition of Probability. In an . elementary 
treatment a rigorous formulation in terms of general 
aggregates is not possible. It will be necessary to restrict 
consideration to aggregates with a finite number of elements 
only ; in this case the measure of an aggregate or sub- 
aggregate is simply the number of elements it contains. 
The reader may take it that the theorems can be extended 
to more general aggregates. 

Definition. If an event E can result from the phases 
of a system 8, there being n different phases and no more, 
all equally likely a priori ; and if m of these phases entail 
the occurrence of E (so that m do not), then mjn is the 
probabihty p{E ; 8) ofE with respect to 8. 

Continuous Case. If the event E is described by 
the value of a continuous variable x, we may denote the 
probability that x is found between x-^-^Ax and x—^Axloy 

p{x+\Ax, x—\Ax ; 8) = Ap{x ]8), . . (1) 

let us say. By supposing n to tend to infinity and Ax 
to tend to zero we reach the conception of a differential 
element of probability, or probability differential, 

p{x+\dx, x—\dx ; 8) = dp{x \ 8), . • (2) 
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which, when no misunderstanding about S is likely to 
arise, we shall often denote briefly by dp. 

Complementary Event. The failure of JS is denoted 
by Sf and is called the complementary eveiit. The pro- 
bability of E is {n—m)ln, namely 1—p in the finite case, 
and likewise in the continuous case. This is often termed 
the complementary probability and denoted by q, so that 

p-\^q = 1. 

If n is finite and if E must inevitably happen in all 
of the n ways, then p = 1 and E is '' certain,” while 
g = 0 and E is impossible.” If, however, the system 
S depends on a non-finite set or results in events expressible 
by a continuous variable, we must not suppose that jp = 1 
implies certainty, or p = 0 impossibility. For example, if 
a point is taken on a line segment, the chance of a particular 
point P being taken is 0 ; but some point is taken, and so 
the point P cannot be regarded as impossible. 

6. Addition and Multiplication of Probabilities- 
Dependent and Independent Events. An event F will 
be said to be dependent on an event E when the happening 
of either E on E alters the probability of F ; and in the 
contrary case F will be said to be independent of E, An 
extreme case of dependence is that in which the happening 
of either E on F makes the probability of the other equal 
to zero. The events are then said to be mutually exclusiv e, 
(In the continuous case we must take cognizance of ‘‘ almost 
mutually exclusive ” and “ almost independent ” events, just 
as we have of ‘'almost impossible” events for which 2? = 0.) 

The addition theorem of probability is applicable to 
events which are mutually or almost mutually exclusive. 

Theorem. When an event E may happen in the 
form of any one of r mutually exclusive events E^y 
1, 2, 3, ... r, in a system 8 wliich has n equally likely 
phases, the probability of Ej being Pj, then the probability 
of Pis 


p{E ;8) ^pi+p^+.-.+Pr^'^Pi - • (1) 
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Proof. If of tlie n phases entail then = n^jn. 
Since the phases do not overlap (otherwise the events 
would not be mutually exclusive) the total number of 
phases entailing one or other of the E^ is Sn ) ; and so 

3 

^{E ; S) = Sn^jn = 

3 3 

The theorem, which is sometimes called the theorem of 
Total Probability, continues to hold for systems expressed 
by a non -finite or by a continuous variable. 

The multiplication theorem, or theorem of Compound 
Probability, refers in the first instance to independent 
events, but can easily be made applicable, with a suitable 
definition of conditioned probability for dependent events, 
to the latter case. 

Theorem. If E^, j = 1, 2, 3, r, are r independent 
events, each with respect to its own system S^, the 
probability that they all happen when all the are in 
operation is 

piE ; 8 ) = pil>2 — Pr> ■ • • ( 2 ) 

where E denotes the compound event consisting in the 
happening of all the JS7,-, S denotes the compound system 
consisting in the operation of aU the mdp^ = p{Ej ‘ ; Sj). 

Proof. Let n,^ denote the number of phases of 5,-, 
and of these let entail E^. Now each of the phases 
of Sj may be paired in turn with each of the % phases of 
/Sx;, giving rise to compound phases of the double 
system {8^,8^^). By similar reasoning the phases 
entailing E^ may be paired in turn with the phases 
entailing E^^, giving rise to compound phases of 

{8j, 8j,) entailing the double event {E^^, Ej^.). 

By similar reasoning, or step by step, there are 
altogether phases of the compound system 

{8^, 8 2 , 8r) = 8, and of these entail the 

compound event E^, E^) =E. Hence the pro- 

bability of E with respect to 8 is 
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p{JE ; S) = ,,,71^ 

Once again we nmst content ourselves witk the state- 
ment that the theorem remains true for independent or 
almost independent ” systems involving infinite aggregates 
or continuous variables. 

By modifying the definition of p^, p^ we 
may prove an analogous theorem for a chain of events 
jE?!, Ef, each of which influences the probability 

of its successors. 

Let P 2 = pi^^ 5 ^ 2 ) <ienote the probability of E^ 

after Ej^ has happened, p^ ^piE^ ; E^, E^, S^) denote the 
probability of E^ after E-^ and E^ have happened, and so 
on. Slight consideration wiU show that this simply 
involves putting the events in an order of time and that 
then, with the new interpretation of p^, p^i •••> Pr^ the 
above proof proceeds exactly as before. Hence we have 
the theorem of compound probability for a chain of 
conditioned events : 

p{B; S) 

=p{^i)p{^2i ^iM^a ; -^ 1 . -^ 2 ) ( 3 ) 

These theorems of addition and multiplication of 
probabilities are the fundamentals upon which the mathe- 
matical theory of statistics is raised. Since addition and 
multiplication are operations of ordinary algebra, we may 
anticipate that there is an algebra of probability depend.- 
ing on these operations, according to which expressions 
representing independent systems 8j can be compoimded 
in product and the resulting probabilities found by 
inspection of terms. This algebra is the algebra of 
generating functions of probabifity, which we shall consider 
from an elementary standpoint in the next section. 

Ex. 1. The probability of throwing two consecutive aces 
with a true die is ^ 
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Ex. 2. The probability of throwing a head and a tail with 
two coins is -J. 

Ex. 3. The probability of throwing a total of 8 points 
with two dice is 5/36. (The mutually exclusive events are 
6+2, 5+3, 4+4, 3+5, 2+6.) 

Ex. 4. A bag contains 4 black and 3 white balls. Show 
that the probability of drawing 3 black in succession is 64/343 
if the ball drawn is replaced each time, 8/49 if the first ball 
drawn is replaced but not the others, 4/35 if no ball is replaced. 

Ex. 5. The events and are neither independent nor 
mutually exclusive. Denote by the probability that E^ 
and E^ both happen. Prove that the probability that at 
least one of and E^ happen is 

Ex. 6. Generalize the preceding theorem to r events 
jE/j, JE/g, ..., E^. Prove, with an analogous notation, that the 
probability that at least one of the events happens is 

... 

7- Generating Functions of Probability. We shall 
often denote the probability that a variable x takes a 
particular value by and we shall use the following 
nomenclature : 

Probability Function. The function is the 

probability function. When the set of values of x is 
continuous we shall write the probability differential 
dp = cj){x)dx for the probability that x is found in the 
range {x—^dx, x-{-\dx). In this case <f>{x) is often called 
the probability density. 

Variate. A variable which has a probabihty function 
will be called a variate. 

Generating Function. Associated with f){x) we 
introduce the generating function (g.f.) of probability, 
defined by 

0{t)^G{t; <f>)=S<l>[x,)t^, . . . (1) 

for variables which take discrete values, and by 

m = 



( 2 ) 
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for continuous variables, the integral being over the whole 
range of possible values of a:. 

Ex. 1. The generating function of probability for heads 
in a symmetrical coin is 

Ex. 2. The g.f. for a symmetrical six-sided die is 

Ex. 3. The g.f. for an uiis\mmetrical coin in which the 
probability of heads is p, of tails q, is pt-r^^ 

Ex. 4. Write down the g.f. for a sjunmetrical four-sided 
die ; also for an unsymmetrical one in which the probabilities 
of faces marked 1, 2, 3, 4 are Psf 

Ex. 5. If all points on the straight line from a; = 0 to 
a; = 1 are equally probable, the g.f. is 

j t^dx = (J— l)/log^^. 

8. Properties of Generating Functions. Suppose 
first that we have an event and its complement 
of respective probabihties p^ and and a second indepen- 
dent event with its complement E^, of probabilities 
P2 a>nd Then the compound probabilities of the four 
mutually exclusive events 

(^ 1 , E^), (E,, E,), ( 4 , E^), E,) . . ( 1 ) 

are respectively PiP^j Pi^2i relate these 

to the terms on the right of the algebraic identity 

study of this identity will reveal the most important 
property of generating functions. The disjunction between 
added terms (those linked by plus signs), both in the factors 
on the left and in the expanded product on the right, 
reflects in each case the disjunction into a number of 

B 
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mutually exclusive events. The operations of multiplication, 
on the other hand, are' carried out on expressions sjnnbo- 
lizing independent events. For example, the multiplication 
of the two factors on the left interprets the compounding 
of the two independent systems 8^^ and 8^ of which they 
are the generating functions ; and the results of multi- 
plication visible in single terms on the right, such as 
PiP 4 Tf 2 >i represent at the same time the compounded 
probabilities, PiP^i ^rid the compounded events, 
characterizing In fact the algebraic operations 

are faithfully carrying out the consequences of the two 
basic theorems of probability. Mere inspection will 
convince us that this is true not only for binomial 
expressions compounded in product as above, but for 
multinomial expressions, as in the following example. 

Ex. 1. Let the reader consider events E^, E^, E^ of pro- 
babilities p^, Pg, Pg with respect to S, events E'^, E'^, E'^, E'^ 
with probabilities p^, p^, p^, p^ with respect to an independent 
system S', and examine the product 

+^ 2*2 +pA+p/z +p7i'> 

in relation to the 12 events of the compound system 
(S, 8'), 

Regarding a compound system {8, 8') as a single 
system and introducing further independent systems one 
at a time, we may prove step by step that to find the 
respective probabilities of aU the mutually exclusive events 
arising from the compormding of r independent systems, 
we must construct the product of r expressions of the 
kind exemplified above, and examine the individual terms 
of the expansion. 

Ex. 2. In an expansion of three factors such a term as 
interpreted as meaning that the com- 
pound event [E^, E") has probability p^p^p^* 



PRODUCT OF GENERATING FUNCTIONS 


19 


The variables and so on are introduced for the sole 
purpose of preventing the terms from being merged 
together ; for when the are explicit fractions such as 
I, I and the like some such device is needed. 

Now suppose the event involves the addition of Xj 
points to a score, or the assumption by an additive variate 
X of an increment In such a case we represent by Pi 
rather than by taking advantage of the fact that when 
expressions like and p^ are multiplied together we 
have by the law of indices the probabilities 

being multiplied as they ought to be, and the increments 
Xj and Xt^ being added as they ought to be. With this 
understanding, the system under which x may assume 
values Xj with probabilities p,-, y = 1, 2, r, is char- 
acterized by the expression 

. ( 3 ) 

5 

But this is merely the generating function Q{t) of the 
system, and so we infer the important theorem, for 
discrete variates in finite sets : 

The g.f. of a compound of independent systems is the 
product of the gf^’s of the separate systems. 

By a limiting process, with due precautions on the 
functions concerned, this multipKcative law can be extended 
to g.f.’s involving continuous variables. Thus, if G^{i) is 
the g.f. of the variable x, and of a statistically 

independent variable y, then G-^{t)G^{t) is the g.f, of x-\-y ; 
and so for more than two variables. 

Ex. 3. The probabilities of 3 heads, 2 heads, 1 head and 
no heads in a throw of three symmetrical coins (or three 
separate throws of one coin) are the coefficients of P, P, t and 1 
in the expansion of namely f, f, J respectively. 

Verify this also by enumeration of cases. (Write H for head, 
T for tail ; then the cases are HHH ; HHT^ HTH, THH ; 
HTT, THT, TTH ; TTT.) 

Ex. 4. The corresponding probabilities when the coin is 
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iinsymmetrical, with probability jp for heads and q for tails, 
are the coefficients in the expansion of (pt-\~qY. 

Ex. 5. The probabilities when the three coins are un- 
symnaetrical are the coefficients in the expansion of (pj-i-qi) 
(P2i^^2)(P3^'i-^8)- 

Ex. 6. The probabilities of n, 2, 1, 0 heads in 

n throws of an •unsymmetrical coin are the coefficients of 
powers of t in the expansion of (pt-j-q)^. 

Ex. 7. Write down the corresponding g.f. for the 
simultaneous throw of n different unsymmetrical coins. 

Ex. 8. A tetrahedral, a cubical and an octahedral die, all 
symmetrical, are thrown together, their faces being numbered 
in each case from 1 upwards. Show that the probabilities of 
totals 3, 4, 18 are arrayed by coefficients in the expansion 

of 

Ex. 9. A coin is thrown n times. Each time a head 
occurs, 2 is added to the score ; each time a tail occurs, 1 is 
subtracted. The g.f. is 

Ex. 10. Four tickets marked 00, 01, 10, 11 respectively 
are placed in a bag, and drawn one at a time, being replaced 
each time. Prove that the chance of drawing five times and 
obtaining ticket numbers summing to 23 is the coefficient of 
in the expansion of 

Find this coefficient, and verify the result by enumeration. 

9. Moments and Moment Generating Functions. 

It is convenient to describe a probability function <l>{x) 
by certain coefficients or parameters coimected with it, 
such as moments, seminvariants and others later to be 
defined. The moments commonly employed are based on 
powers of x, and are defined by 

(j){x) or jx'^^{x)dx, (1) 

according as the variate is discrete or continuous. The 
summation or integration is over the whole range of 
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possible values of x. If the values wMcii x can take are 
discrete and spaced at unit intervals (for example if x 
records the number of heads in n throTrs of a coin) it is 
mathematically preferable to use factorial moments^ defined 

by 

P'ir) ~ 

where x^^^ = x{x—l){x—2) ... (a:— r+1). . (2) 

Note. The privilege often accorded to ordinary “ power ” 
moments is one of custom only ; no special sanctity attaches 
to them. 

Mathematical Expectation. If f{x) is a function of 
X, and (j){x) is the probabihty function, or (f>(x)dx the 
probabihty differential, then the sum or integral 

2 f{x)(f>{x) or J f{x)<l>{x)dx . . . (3) 

is called the mathematical expectation of f{x). It is often 
denoted by Bf{x). The moment is therefore the 
mathematical expectation of x^. 

Moment Generating Functions. If we put ^ = e® 
in the g.f. of probability G(t), we obtain 

G{e^) = I!cj){x)e^^ or | <^{x)e^dx . . • (^) 

X J 

= l-r iJ.[a+ fi.^a”l2\ -\- , 

provided that the sum or integral converges over a range 
of a and that expansion of and integration term by 
term is permissible. This function, which we shall denote 
by M{a), may be regarded as generating the moments /x', 
in the sense that is the coefficient of a^jrl in ir(a). 
Of course a, like t, is a variable introduced to facilitate 
manipulation, in fact to carry the moments. We shall 
call if (a) the moment generating f miction (m.g.f.) of x or 
of <j>{x)f or of the system in question. 
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Factorial Moment Generating Functions. When 
factorial moments are in question, we can construct a 
factorial moment generating function (f.m.g.f.) very simply 
from the probability g.f. by the substitution ^ — 1 -f-a. 
For then we have 

G(l+a) .... (5) 

== l+/^;i)a+/z'2^a2/2!+/^'3^a3/3!+ 

by expanding (1 +a)® by the binomial theorem and summing 
the resulting terms. 

Example. The f.m.g.f. of the distribution characterized by 
{pt-\-q)» is (l+23a)”. 

Note. The reader who is acquainted with more advanced 
mathematics may observe that for moment generating 
functions the substitution t ~ instead of # = has a 
certain advantage. It gives the modified m.g.f. 

. • ( 6 ) 

a Fourier transform of The integrand and integral are 

bounded, and the reciprocal theorems of Fourier transforms 
are available. 

10. Seminvariants and Semin variant Generating 
Functions. If the logarithm of the moment generating 
function M{a) can be expanded as a convergent series in 
powers of a in the form 

L{a) = logMo) = Xia+X^a^l2l + + . . . , . (7) 

then L{a) is defined to be the seminvariant g.f,, and the 
coefiS.cients are called the seminvariants * of the function 
Since m.g.f.’s are compounded in product, s.g.f.’s 
must be compounded in sum, whence the theorem : 

When independent systems are compounded the r^^ semin- 
variants \ of the separate systems are added to form the 

seminvariant of the compound system. 

This additive property of seminvariants is indeed the 

* The word cumulant,” suggested by R. A. Fisher, is perhaps 
to be preferred, since ** seminvariant ” is already appropriated in 
the theory of algebraic invariants. 
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reason for introducing them. In the same way, by taking 
the logarithm of the f.m.g.f. we can define a factorial 
s.g.f. and factorial seminvariants. 

Example. The factorial seminvariants corresponding to 
are np, 21np^, ~3Inp^ and so on. 

11. Change of Origin and Scale in Generating 
Functions. Change of Origin. If the origin horn which 
the variate x is measured is transferred from a; = 0 to 
X = a, any value x will be changed to z—a. Hence 
every factor in a term of the probability g.f. will become 
fx-a . accompanying probability ^(a:), though 

changed in notation, will not be changed in value. Hence 
the effect is to multiply the whole g.f. by t-^. 

This very simple rule leads to corresponding ones for 
the m.g.f., f.m.g.f. and s.g.f., namely : 

A change of origin from x = 0 x = a has the effect 
of multiplying the m.g,f. hy of multiplying the f.m.g.f, 
by (1 +a)“®' ; and of adding to the s.g.f. the term —aa. 

Thus only the first seminvariant Xj is changed ; it 
becomes Aj— a, while Ag, A3, ... are unaltered. 

Change of Scale. If the scale of measurement is 
altered so that what was previously recorded as z now 
reads Jcx, then every factor t^ in the previous g.f. now 
becomes that is, Hence in the m.g.f. the previous 

e®* now reads e^^. Hence the rules : 

Change of scale, so that x becomes kx, has the effect of 
replacing t hy t^ in the probability g.f., a hy ka in the m.g.f. 

The immediate consequence is that the previous r^^ 
moment and seminvariant A^ become and ^^A^. 

The reason for the name seminvariant is now seen ; for 
xmder a change of origin and scale in x all seminvariants after 
Ai are altered at most by a scale factor. 

Change of scale in the f.m.g.f. will be effected hy 
replacing l+a by (l+a)*. 

Example. The first moment or mean of the distribution 
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which has g.f. is np, and the m.g.f. with respect to 

the mean as origin is The corresponding 

f.m.g.f. is (l-ha)'*^^ (l+pa)”. 

12, Population, Universal, Universe or Stock, 
Sample. To conclude these questions of nomenclature 
and general notions we explain what is meant by popula- 
tion, universe or stoch, and sample. As an example let 
us consider the repetition of an experiment in which the 
probability of success is p == mfn, a rational fraction. We 
may construct a model by taking (or imagining) n similar 
objects, such as equal spherical marbles, of which m are 
distinguishable from the rest, and drawing an object 
repeatedly, with replacement after each drawing. Such 
an assemblage, actual or hypothetical, constitutes a 
population, universe or stoch. It is in fact merely a model 
of the system 8, To cope with special cases we have 
often to conceive a fictitious infinite population. For 
example, if we wish to represent drawing with replacement 
by a model in which the drawing is without replacement, 
the population of the model will certainly have to be 
infinite, since the probabilities of successive drawings are 
constant, a thing which cannot happen with a finite 
population. 

Sample. Any element of a population is a sample 
of that population. For example, if five drawings are 
made, with replacement each time, from six cards numbered 
1, 2, 3, 4, 5, 6, the population of possible sets of five cards 
contains 6^ or 7776 elements, of which (3, 5, 6, 4, 1) and 
(4, 4, 2, 6, 3) are two samples. If the drawing is without 
replacement, the population of sets of five contains 
6. 5. 4. 3. 2 or 720 elements, of which (2, 3, 5, 6, 4) and 
(5, 2, 4, 3, 1) are two samples. Or again, if a coin is spun 
100 times, the sequence of heads and tails arising is to be 
regarded as one sample out of the possible 2^®® sequences 
constituting the population of sequences. 

The word '' sample is also used as a verb, to sample 
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a population meaning to draw a sample, or samples, from 
that population. 

Notation. It is important to distinguish the pro- 
bability cl){x), which may not be deJ&nitely known, from 
the relative frequency of x as found in a sample, let us 
say f(x) ; and in the same way all parameters, such as 
means and moments, connected with (f)(x) should be 
distinguished from the corresponding parameters in the 
case off{x). As far as possible we shall make this distinc- 
tion by using Greek letters for probability functions and 
parameters, italic letters for the corresponding frequency 
functions and parameters. Thus if p.' stands for the 
moment of (f>{x), then m' will be the moment of f{x) ; 
and so on. 

For detailed description of many aspects of theoretical 
and practical statistics, and for bibliographical references to 
memoirs and texts on the subject, the reader may consult 
An Introduction to the Theory of Statistics, by G. U. Yule 
and M. G. Kendall, London, 1937, the 11th edition of the 
original book by the first-named author. 
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PROBABILITY AND FREQUENCY DISTRIBUTIONS: 
GRAPHICAL REPRESENTATION: CALCULATION 
OF MOMENTS 

13. Distributions, Probability Curve, Histogram. 

Tlie assemblage of values of probabilities for all the 
possible values Xj of x that may occur iu any system 8, 
is called the probability distrihition oixia 8. In practice 
a set of n observations in a sample does not usually give 
aU the possible values a:^, and certainly cannot give them 
all if they cover a continuous range. Further, the sample 
of n values is itself only one member of the population, 
often prodigiously large or even infinite, of possible samples 
of n values that might have been drawn. 

The relative frequency of Xj in a sample of n values is 
denoted by f{Xj). The assemblage of relative frequencies 
f{Xj) for the sample is then called th.Q frequency distribution 
of X in that sample. The name is also often given to the 
assemblage of absolute or actual frequencies, but these are 
merely obtained by multiplying all relative fi:equencies 
byw. 

Ex. 1. In repeated throws of a symmetrical coin the 
respective probabilities of runs of 1 head, 2 heads, 3 heads, 
... are -jg, ... . Hence in 400 throws the ideal 
probability distribution may be tabulated (to the nearest 
integer) as : 

X 1 2 345678 Total 

50 25 13 6 3 2 1 0 100 

In an actual experiment of 400 throws (performed by the 
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author) there were 196 heads, and the frequency distribution 
distributions of runs of x heads was : 

a; 1 2 345678 Total 
n/ 51 24 14 4 5 1 0 1 100 

Comparing the actual with the theoretical distribution, 
the reader will note a fairly close agreement, and also a slight 
irregularity in the frequencies. 

If re is a continuous variate, the curve y = ^{x) is 
called the 'probability curve of x, (The term frequency 
curve ” will often be found, but it is not strictly accurate. 
(7/. 12.) The curve may be symmetrical about its central 
ordinate ; or it may have the “ long tail ” to the positive 
or right side, in which case it is said to be positively skew ; 
or to the negative or left side, in which case it is negatively 
shew. In some cases, as in the probabilities of runs of 
heads just considered, the curve may not descend at all 
on one side or the other. A curve so extremely skew is 
called positively J -shaped, or negatively J -shaped, as the 
case may be. In a rare type of distribution called the 
U-shaped curve the minimum ordinate is in the middle 
region. The area under a probability curve measures the 
total probability of all possible values of x, and is therefore 
equal to 1. 

Neg. J-shaped. Neg. skew. 

Pos. skew. Pos. J-shaped. 

If a; is a discontinuous variate the plotted points {x, y), 
where y ^ <f>{x), do not form a curve. The sum of the 
ordinates is equal to 1. It is customary, though there 
is no very cogent reason for doing so, to join these points 



U-shaped. 
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each to its neighbour by straight lines, thus obtaining the 
probability polygon for the distribution in question. The 
terms symmetry ” and skewness then have corre- 
sponding meanings. 

Frequency Polygon, Histogram. In an actual 
sample of observations we have relative frequencies instead 
of probabihties. If the variate x is discontinuous, as for 
example the number of flowers on stalks, the number of 
beans in bean-pods, we obtain separate plotted points 
{x,f{x)) which, joined to their neighbours, form ' b, frequency 
polygon. 


I’reQuency Polygon. Histogram. 

On the other hand, x may be a continuous variate, the 
range of which in the process of measurement is broken 
for convenience into intervals of flnite breadth. For 
example, height of men, measured in inches, is a continuous 
variate ; all heights within a certain range are conceivable. 
But in practice heights may be recorded to the nearest 
inch, in which case all individuals of the sample having 
heights in the range 66*5000... to 67*4999... inches form 
a frequency group or frequency class corresponding to 
X = 67, the central point of the class. In such a case 
it is customary to represent the class graphically not by 
a single ordinate at the central point but by a rectangle 
on the class-interval (as 66*5 to 67*5) as base and of height 
proportional to the class frequency or relative frequency 
f[x). The figure of juxtaposed rectangles is then called 
the frequency histogram or simply the histogram (that is, 
diagram made up of cells), and it furnishes a rough 
approximation to the ideal probability curve. 

Ex. 2. Plot the probability polygon for the runs of heads 
in Ex. 1 ; also the frequency polygon of the experiment. 

Ex. 3. Note that often great care must be taken to 
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ascertain the exact class -boundaries and centres of classes. 
For example, the British Anthropometric Committee (Report, 
1883, p. 256) measured the height of 8585 adult males in the 
British Isles, made up of samples of 6194 from England, 
1304 from Scotland, 741 from Wales and 346 from Ireland. 
The distribution of the Irish sample reads as follows : 

X 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 n 

nf I 0 2 2 7 15 33 58 73 62 40 25 15 10 3 346 

When we are told, however, that the class x = 59 inches 
means “ 59 and over,” but at the same time that measmements 
were to the nearest eighth of an inch, it appears that class 
X — 59 means from x — 58^ to 59^, so that the centre of 
the class is at rr — 59^^ ; and so for every other class. 

The reader should draw the histogram for the above 
distribution, choosing not too small a scale for the 
frequency. 

For ease and rapidity in computation we can always 
by a change of origin take any convenient value of x as 
new origin, and by a change of scale make class intervals 
of unit breadth. At the end of any calculations we can 
translate the results back to the proper origin and scale. 
It is often convenient to choose a provisional origin either 
near the middle values of x or at one or other end of the 
range. 

Ex. 4. In the distribution of Ex. 3, if 67 is taken as new 
origin for x, the classes range from cc == — 8 to a; = -f 6. If 
these classes are presumed to be centred, the origin is not 
67 but 67-^. 

14. Descriptive Parameters of Distribution. Pro- 
bability and frequency distributions may be described, 
not completely, but in their main features, by the values 
of their moments, factorial moments or other parameters. 
Some of these parameters have a geometrical significance. 

Typical Parameters or Averages. There are three 
of these in common use, the mode, the median and the 
arithmetic mean. 
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Mode. The mode is the value of x for which the 
probability <5^(0;), or in a frequency distribution the relative 
frequency f {x), is a maximum, that is, greater than the 
probability (or frequency) on either side. In a probability 
curve it is the abscissa of a maximal ordinate. 

Many curves have a single maximum near the middle ; 
others may show two maxima or more. These are called 
dimodal or multimodal, as the case may be. 

Median. 

Median. The median is that value of x which divides 
the sum or integral of the probabihties over the whole 
range into two equal parts. This sum or integral must 
be equal to 1 ; and so if the range of values of x is from 
X = ato X = b, the median value of x is defined by 

Z<j>{x) = S<f>{x) = \S<i>{x) - i . . (1) 

ax a 

f*x rb rb 

or I cj){x)(lx — (f>{:f )dx ~ J ^{x)dx = J. . (2) 

J a J X J a 

Tor a continuous probability curve the median ordinate, 
by (2), bisects the area under the curve. 

Arithmetic Mean. The most widely used typical 
measure is the arithmetic mean, which is simply the 
ilrst moment or mathematical expectation of x, namely 

= Ex<j){x) or jxcl){x)dx. . . ( 3 ) 

These formulae are the same as those occurring in 
dynamics for the centroid of a series of particles of masses 
(l>{Xj) placed at points Xj along a straight line, and the 
centroid of a straight rod of density ^{x) at the point x. 
It follows that the arithmetic mean is the abscissa of the 



Mode. Dimodal Curve. 
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ordinate througli the centroid of the area under the curve 

The arithmetic mean of the values in a sample is 
correspondingly 2xf{x), 

Remarh, In many probability curves of slight or moderate 
skewness the median lies between the mode and the arithmetic 
mean, nearly twice as far from the mode as from the mean. 

Moments about the Mean. The arithmetic mean is 
so fundamental in theory and in practice that it is 
customary, once it has been determined, to take it as a 
new origin and to refer aU higher moments to this origin. 
Moments about the mean as origin are usually denoted 
by undashed We j&nd easily, by binomial expansion, 

[jLf. = 2{x—ix'j)^<l>{x) or j {x—[jL^ycj>{x)dx 
= Vr-2-^— 

wheYe r^s) denotes the familiar binomial coefficient 
r(r— l)...(r— 5+l)/5 !. The last two terms can be merged 
into one as ( — — 1) (fti) For example : 

/zi = 0, 

P'2 ~ P2 

P3 ~ P3 ^PlP2"l"^(pl)^> 

P4 =- P4“^PlP3+^(pl) V2~'^(P^)^ - (5) 

formulae of regular application in practical wmrk, since 
they hold equally well for moments of a frequency dis- 
tribution, <^{x) being then replaced by f{x), and [l by m. 

Other means, such as the geometric and the harmonic 
mean, are very occasionally used with respect to rather 
special distributions. 

Seminvariants in Terms of Moments about tbe 
Mean. By expanding the logarithm of 

M(a) = eMi«^(l-fja2a2/21+P3a'®/3! + ...) 
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as a series in powers of a, and comparing the coefficients 
of these with the coefficients in 10 (7), we find the rela- 
tions between seminvariants (or cumnlants) and moments 
about the mean. The first four relations are 

Aj = /Xp A2 = A3 = ^35 A4 = /X4 — 

15. Measures of Dispersion or Spread. Distribu- 
tions differ according as the values of x are spread densely 
or widely on either side of the mean. To describe this 
feature numerically we need parameters measuring 
dispersion. 

The arithmetic mean of the deviations from the 

mean is of course of no use for the purpose, being equal to 
zero. A measure occasionally used, but now falling into 
disuse, is the mean absolute deviation (the former name was 
mean error ”) defined by the arithmetic mean of devia- 
tions from the mean all taken with positive sign^ namely 

I!\x—ijl[\(I>{x) or J \x—fx[\(f>{x)dxj . . (1) 

where \x—fjb[\ denotes the positive numerical, or absolute y 
value of x—ixy 

Though usually computed with respect to fjL[, it is 
actually in closer association with the median, in virtue 
of a certain minimal property, namely : 

The median value of x is such that the sum of the absolute 
deviations from it, i^jx— x^-j, is a minimum. 

The median of a discrete set of values x^ needs more 
precise definition. If an odd number of values is ranged 
in monotonic order x^, x^, so that each 

we shall define the median as the middle value, x^. If 
an even number of values is so arranged as x^, x-^, 

shall say that the median is any value of a: in 
the middle interval, that is, is such that 
The minimal property may then be proved as follows : 

{a) Let there be 2n~\-l values x^, x^, ..., x^^, and let 
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Tis call the interval between and Xj inclusive the 
interval. The median is at a; = x^- Let ns denote by 
jS{x) the sum S\x—Xj\ of absolute deviations jfrom any x. 

First consider S(x) as compared with S(Xn), where x 
is in the (n+iy^ interval, on the right of the median, 
and x—x^ ~ h. Then the absolute deviations of the n-\-l 
values Xq, x^^, ..., x^ on the one side have each been 
increased by h, while those of the 71 values 

X2n on the other have each been decreased by h. 
Hence in this interval 

8{x)-8{x^)^h, . (2) 

Now suppose X moves into the next interval, the 
(n-{-2Y^, Comparing 8{x) with ^0 note that if 

— h the absolute deviations of the 71 +2 values 
r^Q, a?!, ..., each receive an increment h, while those 
of the remaining n~l values receive a decrement h. 
Hence in this interval 

8{x)-S{x^^^)^^h, ( 3 ) 

In this way 8{x) increases as x moves through successive 
intervals to the right, the increments which it receives 
within the intervals being li, 3^, 5^, {2n~~l)h \ and 

by symmetry, or by a similar proof, 8{x) receives corre- 
sponding increments as x moves through successive intervals 
to the left of x^. 

Hence /S(a;) is a minimum for x — x^. 

(d) Let there be 27z values Xq, x^, ..., X 2 n->v 

The reader will see at once that if x lies in the central 
interval, the 7i^^ interval, and if within that interval is 
displaced by an amount h, then n absolute deviations on 
the one side each receive an increment h, while ti. on the 
other each receive a decrement h. Hence 8{;x) is constant 
within the central interval . 

Also, as X moves out of the central interval either to 
right or to left through successive intervals, S{x) receives 

o 
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til© respective increments 4Ji, (27i--2)A. Hence 
8 {pc) is a minimum \7itiim the central interval. 

(c) The result for a continuous variate x can be proved 
as a limiting case of (a) and (6), or else directly thus : 

Let {—a, h) be the range of values of x, the median being 
taken as the origin a; = 0, so that 


<j){x)dx = cl>{x)dx ■■ 

4 / '~Ql */ 0 


(4) 


The integral S{h) of absolute deviations from x = h, 
^>0, is then 

8{k) = f (h—x)<^{x)dx+ I* {x—'h)(^{x)dx 

J -a J A 


{h—x)(f>{x)dx-\~ 


■ Ch CAl 

— {x—'h)(f){x)dXj (5) 

0 0_ 


eas 

ro rh 

8(0) = I —(x)(jl>(x)dx-i- I x<f>(x)dx, . 

J-a J 0 

Hence 

S{x)—8{0) = I <f>{x)dx^ I <j>{x)dx\ +2 ( {h~x)(l>{x)dx 
LJ-a Jo J Jo 


whereas 


( 6 ) 


rh 

= 2 j (h—x)(f){x)dx, 

J 0 


( 7 ) 


and this is essentially positive, since <l>(x) is a positive 
function. The same result may be proved to hold for 
h<0, and so 8(0) is a minimum. 

Note. The indeterminacy of the median of an even number 
of discrete values matters exceedingly little in practice, 
the two middle values being for the most part indistinguishably 
close. 


The Quantiles. The median ordinate halves the 
distribution. Halving again the two halves, we may find 
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values of x which are called the quartile measures. For 
discrete distributions, they lie one^quarter and tliree- 
quarters along the line of values supposed arranged 
in ascending order. For continuous distributions of range 
x = a to X = h they are values q^, q^ (it is hardly worth 
while here to press further Greek letters into service) 
such that 

I ^{x)dx — I <j>{x)dx = (8) 

J a J 9, 

The median might be regarded as a middle quartile q^, 
the other two are called the upper and lower quartiles. 
The value of 1(^3™ ^i) furnishes a measure of dispersion 
called the semi4nterquartile range. Any value of x has an 
d priori probabihty of \ of being such that qi<x<q^; 
it is as likely to be inside the range as outside it. For 
this reason, in the theory of errors, this particular measure 
of dispersion has long been called the probable error of 
the distribution. The name is very misleading, since there 
is nothing specially probable about this particular devia- 
tion ; and of late there has been a salutary tendency to 
supersede the so-called probable error by the standard 
deviation, which we now define. 

Standard Deviation. The arithmetic mean of the 
squared deviations (x—fx[)^ from the mean, that is, the 
second moment /xg, is obviously a suitable measure of 
dispersion. The square root of this, formerly called 
the root-mean-square deviation, is now called the standard 
deviation and is denoted by a. The sample value is 
denoted by s. Thus = /Xg, 5^ = 

Variance. Modern usage is tending more and more to 
treat /Xg or itself, rather than a, as a suitable measure of 
dispersion, under the name of variance. We have therefore 

cjS = 2J{x—fi[)mx) or J {x—fji[)mx)dx, (9) 
while s^ — I!{x—mi)Y(x) .... (10) 
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The standard deviation has also a minmial property, 
with respect to the arithmetic mean, namely : 

The simi or mean of squared deviations is a minimum 
when taken with respect to the arithmetic mean. 

This fact is obvious at once from the formula of 14 (5) 

/is =/i2 — (Mi)^ 
which shows that /xg can never exceed 

Mean and Variance of Linear Function. If we 
distinguish the respective means and variances of three 
independent variates y, z by triple sufhxes, thus, 
A^ioo’ /^oio» /^ooi /^200’ /^020’ A^oo2’ fi'om the properties 
of seminvariants (11) the linear function ax~\-hy-\-cz has 

mean 

variance o 2 ^ 200 +^Vo 2 o+^Voo 2 ^ 
and similarly for a general linear function in any number 
of independent variates. 

Range, Extremes. Other indications of the disper- 
sion of a distribution are given by the size of the range 
of X itself, b—a, as well as by the highest value, 6, or 
lowest value, a. 


16 . Measures of Asymmetry or Skewness. When 
the mean is taken as origin a; = 0, it may happen that 
j,{x) = so that the distribution is symmetrical. 

Ex. 1. The distribution of number of heads in a throw of 
n symmetrical coins, described by the g.f. is sym- 

metrical about X = ^n. 


Ex. 2. The continuous distribution described by 
1 


dp 


V 273 


~\{x-ay 


dx 


is symmetrical about x = a. 


Ex. 3. The distribution given by 
dp = ^ ^ 

is symmetrical about a; = 0. 


1 H-rr® 
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Lack of symmetry, skewness, is revealed functionally 
or numerically in various ways. 

Various Measures of Skewness. In a symmetrical 
distribution the distances of the quartiles and from 
the median q^ will be equal. In a skew distribution the 
difference between these distances gives a coefficient of 
skewness, namely 

the division by o being for the purpose of removing 
arbitrary units of scale and obtaining an absolute coefficient. 

A natural measure of skewness is however the third 
moment about the mean, /Xg. If the distribution is sym- 
metrical = 0. If the long tail of the distribution is 
on the side of the positive values of z, the cubes of positive 
values of z outweigh the cubes of negative values, so that 
jUg is positive, and we have positive skewness. In the 
same way if the long tail of the curve is on the side of 
the negative values of a;, then /Xg is negative, and we have 
negative skewness. 

To remove arbitrary units of measure, since /Xg is of 
the dimensions of z^, or of a®, we construct an absolute 
measure of skevmess by dividing /Xg by a®, that is by /xp. 
The square of this, /x| is often denoted by 

xAnother measure of skewness (due to K. Pearson) 
depends on the fact that in a skew curve the mean, median 
and mode are not the same. The measure in question is 
defined by 

(Mean —Mode) / (Standard Deviation) . 

Like /Xg it is positive for positive skewness, zero for 
symmetry, negative for negative skewness. 

Skewness of Linear Function. If the 3rd moments 
of independent variates z, y, z about their means are 
^ 030 ? respectively, the 3rd moment of az+by+cz about 
its mean is 


^^/^300^'^V'030'T~^Vo03 > 
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and similarly for linear functions of any number of in- 
dependent variates. This follows (14) because ~ Ag. 

17- Measure of Flattening or £s:cess, Kurtosis. 
Two distributions may have the same mean, the same 
standard deviation, the same skewness, and yet may 
differ in that the curve of the one may be more flattened 
at the centre {flatylcurtic) than that of the other. 



The degree of flattening is suitably measured by the 
4th moment about the mean, Removing arbitrary 
units of measure, just as in the case of we obtain the 
coefficient often denoted by jSg. It has been observed 

in an extensive class of probability curves, with scale chosen 
so that the variance is unity, that the ordinate at the 
mean or mode is greater or less according as jSg itself is 
greater or less. Thus the value of jSg serves to indicate 
whether the curve is tall and slim at the centre (Leptolcurtic) 
or squat {platyJcurtic), In the very important normal 
probability curve, which we shall meet in 32, the value of 
jSg is 3. Hence jSg— 3 is sometimes called the excess, curves 
for which being platykurtic, those for which 

being leptohurtic, the normal curve being taken as 
standard. 

Higher Moments. No simple geometrical inter- 
pretation attaches to parameters expressed by moments 
fij, or rrij. higher than the 4th, except of course that the 
moments of even order might be regarded as further 
measures of dispersion, and those of odd order as further 
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measures of skewness. These higher moments are in any 
case very seldom used in practice for freqimicy distributions, 
because being computed from values of x liable to random 
irregularity, “ error ’’ as it is usually called, they may be 
subject to very great error owing to the raising of some 
abnormally frequent large deviation a: to a high power. 
This will be apparent when we come to consider the 
sampling error of coefficients, in Chapter VII. 

18. Practical Computation of Moments. The 

initial stages of the analysis of frequency distributions 
almost always involve the computation of ordinary or 
factorial moments. In the case of a continuous variate 
artificially grouped (13) into classes, a certain error is 
introduced into the moments by the centring of class- 
frequencies about the centre of the class. The calculated 
moments then require adjustment by formulae of rather 
wide application called Sheppard's Corrections, 

The example on page 40 shows the computation of the 
first four moments and the coefficients of dispersion and 
excess, for a frequency distribution. The column headings 
explain themselves. It will be observed that transference 
is made to the more convenient provisional mean a; = 67, 
this being judged by inspection of the distribution to be 
somewhere near the true mean. 

Sheppard’s corrections have not been used ; we shall 
allude to this example when we come to discuss them. 
As for the mean height of the group, the provisional origin 
is really, as we saw earher, 67 tV, or 67-44 inches. Hence 
the mean height is 67-44+0-34 = 67-78 inches. 

The distribution shows a slight negative skewness. 
Whether this is a genuine effect or due to the irregularities 
of sampling cannot be decided until we know more about 
the probability distributions of coefficients calculated from 
samples. 

The reader should verify that the sample estimates of jSj and 
are *0014 and 3-56. 
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Example. The distribution of heights of adult Irishmen. 


X 

-nf X 

nfx 

nfx^ 

nfx^ 

nfx^ 

59 

1 -8 

-8 

64 

-512 

4096 

60 

0 -7 

0 

0 

0 

0 

61 

2 -6 

-12 

72 

-432 

2592 

62 

2 -5 

-10 

50 

-250 

1250 

63 

7 -4 

-28 

112 

-448 

1792 

64 

15 -3 

-45 

135 

-405 

1215 

65 

33 -2 

-66 

132 

-264 

528 

66 

58 -1 

-58 

58 

-58 

58 

^67 

73 0 (-227) 0 

0 

(-2369) 0 

0 

68 

62 1 62 

62 

62 

62 

69 

40 ‘ 

1 80 

160 

320 

640 

70 

25 ^ 

J 75 

225 

675 

2025 

71 

15 4 60 

240 

960 

3840 

72 

10 1 

> 50 

250 

1250 

6250 

73 

3 6 18 

108 

648 

3888 


346 

(345) 

)1668 

(3915) 

)28236 



)118 

4*821 

)1546 

81*61 



0*341 


4*468 



7n[ =0*341 4- 67. 

nii = 4-821 -(0*341)2 4.705. 

m3 = 4*468~3{0*341)(4*821)4-2(0*341)3 = -0*385. 

^4 == 81*61-4(0*341)(4*468)+6(0*341)2(4*821) -3(0*341)^ = 78*84. 


19. Computation of Moments by Repeated Sum- 
mation. If the origin of a distribution be taken at either 
end, preferably at the lower end, factorial moments can 
be computed by a process of repeated summation. We 
sum frequencies in columns from the remote value of x 
towards the origin, in the mamier exemplified below. 
The leading sum in each column is one step lower than the 
leading sum in the preceding column. 
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Ex. 1. The same distribution. 

with origin at x 

= 59. 

X 

n} 

Z 


E^ 


E^ 

59 ■<- 0 

1 

346 





1 

0 

345 

2886 




2 

2 

345 

2541 

11407 



3 

2 

343 

2196 

8866 

28343 


4 

7 

341 

1853 

6670 

19477 

49757 

5 

15 

334 

1512- 

4817 

12807 

30280 

6 

33 

319 

1178 

3305 

7990 

17473 

7 

58 

286 

859 

2127 

4685 

9483 

8 

73 

228 

573 

1268 

2558 

4798 

9 

62 

155 

345 

695 

1290 

2240 

10 

40 

93 

190 

350 

595 

950 

11 

25 

53 

97 

160 

245 

355 

12 

15 

28 

44 

63 

85 

110 

13 

10 

13 

16 

19 

22 

25 

14 

3 

3 

3 

3 

3 

3 


T ! 





24 


The successive sums at the heads of columns may be proved 
(Appendix 2) to be equal to We have therefore 

n ~ 346, = 2886, ~ 22814, 170058, 

rmif.. — 1194168. 

(4) 


Transforming to ordinary moments m' by the relations 
(Appendix 3) 


S=K2) + "'(!)’ 

”^3 = “;3) + 3'"(2)+"‘a)' 

S = “; 4 ) + ‘*'"( 3 ) + '^»”(2) + '”u)’ 


( 1 ) 


we obtain ~ 2886/346 


8*34104, 
m', = 25700/346 =74*2775, 
697*647, 
:6870*23, 


m'g = 241386/346 


= 2377100/340 


from which, by adjusting (14 (5)) to moments about tlie 
moan, wo derive 


= 4*705, wig = —0*386, — 78*87. 
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Now it may be noted that some advantage has been 
lost through the large numbers that arise in summations 
from end to end. Even though six significant digits have 
been retained throughout, the final results are very slightly 
discrepant with those computed by the other method. To 
obviate these disadvantages (which are not serious when a 
calculating machine is available) one may use either (i) fac- 
torial moments obtained by summation from both ends 
towards an origin near the centre, or (ii) central and mean 
central factorial moments obtained by a slight modification 
of this summation. 

Ex. 2. Ordinary factorial moments. Origin at a: = 67. 


¥ 

Z 

2^ 


2^ 


1 

1 

1 

1 

1 

1 

0 

1 

2 

3 

4 

5 

2 

3 

5 

8 

12 

17 

2 

5 

10 

18 

30 

47 

7 

12 

22 

40 

70 

117 

15 

27 

49 

89 

159 

276 

33 

60 

109 

198 

357 

633 

58 

m 

227 

425 

782 

1415 

73 

228 





62 

155 

345 




40 

93 

190 

350 



25 

53 

97 

160 

246 


15 

28 

44 

63 

85 

no 

10 

13 

16 

19 

22 

25 

3 

3 

3 

3 

3 

3 

r J 

1 

1 

2 

6 

24 


From the italicized entries we obtain 
n =228 + 118 = 346, 
nm'i) = 345-227 = 118, 
nm'g) = (350+425)2 = 1550, 
nw'g) = (245-782)6 = -3222, 
^^<4) = (110+1415)24 = 36600. 
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These may be transformed to ordinary moments m' by 
the same relations as before, yielding 

m; = 118/346 = 0*341, = 1668/346 = 4*821, 

m'g = 1546/346 = 4*468, = 28236/346 = 81*61. 

These are the same values as were found by the first method. 

Ex. 3. Central and mean central factorial moments, with 
the same origin x = 67. 

Here we again sum towards the centre from the ends, but 
each alternate sum (shown bracketed and italicized) involves 
the adding of only half the last summand in the preceding 
column, while the last sums in the other columns step 
successively away from the centre, as shown. 


nf 

2 





1 

1 

1 

1 

1 

1 

0 

1 

2 

3 

4 

5 

2 

3 

5 

8 

12 

17 

2 

5 

10 

18 

30 

47 

7 

12 

22 

40 

70 

117 

15 

27 

49 

89 

159 

276 

33 

60 

109 

198 

357 

{i54-5) 

58 

118 

(iJ4*d) 

227 

(311-^) 



73 

(191 ’S) 





62 

155 


(522^5) 



40 

93 

190 

350 

595 

{652^5) 

25 

53 

97 

160 

245 

355 

15 

28 

44 

63 

85 

110 

10 

13 

16 

19 

22 

25 

3 

3 

3 

3 

3 

3 

346 

r! 1 

1 

2 

6 

24 


Erom the italicized entries we obtain the central factorial 
moments 

n = 191-5-1-154-5 = 346, 

■ ■■ 345-227 = 118, 
mn'2j = (522*5+311*5)2 = 1668, 

= (595-357)6 = 1428, 

— (652*5+454*5)24 ~ 26568. 
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The formula for the m' in terms of the are rather 
simple (Appendix 3). We have 

118/346 = 0-341, 

= wijjj = 1668/346 = 4-821, 

■ (1428 + 118) /346 = 4-468, 

m' =?n',+m' = (26668 + 1668)/346 = 81-61, 

as before. 

The moments about the mean can now be found in the 
usual way. 

20. Sheppard ’s Corrections for Grouped Moments. 
As mentioned earlier, when a continuous distribution has 
been grouped into centred classes for convenience, the 
moments require adjustment or correction because of this 
artificial grouping. The necessary formulae of correction 
were found by W. F. Sheppard. 

Naturally the problem for perfectly general functions 
<f>(x) is too broad, and it is necessary to impose conditions. 
Sheppard considered the case where ^(x) was such that 
the derivatives <f>'(x), ... vanished in succession at 

the boundaries x ~ a and x = d to such an order that 

j x'^(f>^^\x)dx — (1) 

J a 3 

to a sufficient degree of accuracy, where w is the class- 
breadth and x^ the centre of a typical class ; that is to 
say, the error committed should be neghgible compared 
with sampling errors. 

Remarh. The relation between an integral and a sum of 
equidistant ordinates of the kind here considered enters into 
pure mathematics in the Euler-Maclaurin summation formula, 
by which a sum of ordinates is expressed as an integral over 
the range plus correction terms involving the derivatives of 
odd order taken at the boundaries. In many cases, where 
the derivatives <j>'{x), ... are not absolutely zero but 

converge to zero as a limit, the representation of the integral 
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on the left of (1) by the sum on the right needs very careful 
investigation. It is found, however, that for the statistical 
functions to which Sheppard’s corrections are usually applied 
the difference between the integral and the sum can be made 
negligibly small by taking values of the class -interval w of a 
size quite customary in practice. Usually it is enough for w 
to be less than the standard deviation. The following 
derivation of the formulae must be regarded as approximate 
only. 

Ex. 1. The following two comparisons of integral with 
sum over an infinite range are interesting in this respect : 


whereas 


dx 


7T = 3T4159 nearly. 


H — - ■■ 3T5336 nearly, 

~oo 1 

X taking the values 0, ±1, ±2, ... . Again, 

e"~^^dx = V27r = 2*506628275 nearly, 

whereas 

T e-i®* == 2*506628288 nearly, 


X taking the same values as before. The first sum in these 
examples is only moderately close to the corresponding 
integral ; the second is very close, and still closer results are 
obtained if a summation with a finer subdivision of x is used. 


Suppose the range 6— a divided into n class-intervals 
{x—\w, so that b—a == nw. If the probability in 

the class is 



then the moment calculated from the 


( 2 ) 


grouped classes is 


ii', = Sx]-pj, . ( 3 ) 

3=1 
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whereas the true moment is 


= f x‘^j,{x)dx. 
J a 


J -iw 


riw 

= {<^ix^)-i-x^'(Xi)-{-x^<^''(Xj)/2li-...}dx 

J-iw 

= w^(Xj) +w^(f>"(Xj)/24-i-w^^(Xj) /1920 + . . . , 
provided that this series in powers of w converges. 


Hence 


= ^^jPi 
= jx’'<f>(x)dx+^ 


x’'(f)'’(x)dx-\-- 




(x)dx-i-. 


in view of (1). Integrating by parts and using the fact 
that derivatives vanish at the boundaries, we have 

t , 

•‘f= ... (7) 

If moments about the mean are taken, we have 
therefore the relations (where means the moment of 
the grouped classes about the mean) : 

/^o ^ Mo ~ 

F2 = 

H= ■- 


(8) 
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which, on being solved for the ju.„ yield 

/^i = 

/^2 = M 2 - • ( 9 ) 

H'Z ” 

1 7 

Pi = fh- + 240 *^*. 


and these are the required adjustments, Sheppard's 
corrections. The correction to the second moment is 
especially simple and noteworthy. K the class-interval 
is taken as the unit of scale, the correction amounts to 
subtracting from the grouped second moment. 

It is customary, though the practice requires more 
justification than it has ever received, to apply the same 
corrections for grouping in the case of frequency distri- 
butions, the presumption being that the moments thus 
corrected are a better representation of the moments of 
the underlying probability distribution. 

Ex. 2. Correcting the moments about the mean for 
grouping in the example of 18 and 19 , we obtain for the 
corrected moments 

= 4-705 -0-083 = 4-622. 

== —0*385. 

m^ = 78-84-^(4-705) +0-029 76-52. 

Ex. 3. The reader should seek out for himself numerous 
examples of frequency distributions, and should acquire as 
much practice as possible in computing moments in the various 
ways exemplified above, and in correcting them. Sheppard’s 
correction will be applied in those eases in which the relative 
frequency /(:r) in sample corresponds to probability of a 
continuous variate. 
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SPECIAL PROBABILITY DISTRIBUTIONS 

21. Distributions of Equal Probability. If n values Xf 
of X, where j = 1, 2, w, have each equal probability 
Ijn, the graph of probability consists of n ordinates of 
equal height Ijn. The case of a symmetrical coin is the 
case n = 2, the case of an ordinary unbiassed die is the 
case w = 6. 

The Rectangular Distribution. The limiting case 
of the preceding, when n tends to infinity, yields an 
important distribution called the rectangular distribution, 
namely that in which x has an equal probability of being 
at any point in the range x = a to x = b, a<.b. The 
probability differential is then given by 

dp = 7 -^ dx, (1) 

b—a 

so that (j){x) = lj{b~a) and the probability curve consists 
of a rectangle on the range as base and of height 1/(6— a). 
It is always possible to choose the central point of the 
range for origin, and the unit of scale such that the range 
becomes the new range z = — | to z = ^. The rectangle 
is then a square. The moments of odd order vanish; 
those of even order are 

( 2 ) 

In particular so that or = l/y'12 = 0*2886... 

Example 1. Show the m.g.f. of the standard rectangular 
distribution is (sinh |a)/Ja. 
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Example 2. The following samples from a rectangular 
population have been arranged as frequency distributions. 
The times on 1000 watches displayed in watchmakers’ 
windows were noted by the author. The distributions are 
of the first and second 500 of these. Class x = I means 
the class of all watch times from 1 h. to 1 h. 59 m. to the 
nearest minute, and classes a; = 2, 3, ... 12 have a similar 
meaning. 

*123456789 10 11 12 n 

(i) nf 34 54 39 49 45 41 33 37 41 47 39 41 500 

(ii) nf 47 41 47 49 45 32 37 40 41 37 48 36 500 

The mathematical expectation of the number in any class 
is 500/12, or 42 to the nearest integer. One of the classes in 
the above samples contains 54, and another contains 32. We 
shall see later that the deviations here are not extreme. 

The mathematical expectation, or mean of a; in the 
population, is 6*5. The means of x in the above samples 
are 6*426 and 6*322. 

22. The Binomial Distribution. This fundamental 
distribution arises when n trials are made of a constant 
system 8 with probability jp of an event E, the number x 
of successes in the n trials being the variate. The g.f. 
is i'pt+g)^, and so by binomial expansion the probability 
function, namely the coefficient of in the g.f., is 

^(a:) = . . . (1) 

Moments. The f.m.g.f., obtained by putting t == 1 +a, 
is seen at once to be (l+^>a)”, so that the mean, the 
coefficient of a in this, is np. Hence the f.m.g.f. about 
the mean is (11) 

(l+a)“^3J(l+i)a)« 

= [(l+^a)(l-^»a+p(f>+l)aV2!-^)(i)+l)(2JH-2)aV3!+...)]" 

= [l+p(l-p)a^l2l-2pil-p){p+l)a^lZ\+...]\ . (2) 

whence p.^ 2 ) = npq, /i( 3 ) = —2npq(p-{-l), so that 

F-s = /^(2) = ^3 = fi(3)+¥(a) = (3) 
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It is readily proved in the same way, by finding and 
hence that 

lJL4^ = npq]j>^+{S7i-4:)pq+q^l . . (4) 

The formula a = fundamental importance. 

Example. The following is a sample from a binomial 
population. The Swedish astronomer and statistician, C. V. L. 
Charher, performed 1000 times the experiment of drawing 10 
cards, one at a time with replacement after each drawing, 
from an ordinary pack, the number x of black cards in each 
set of 10 cards being the variate. Thus n = 10, p = ^. He 
obtained the distribution 

a;0123 4 5 6 7 89 10JV 

Nf 3 10 43 116 221 247 202 115 34 9 0 1000 

The corresponding probahihty distribution has g.f. 
Multiplying this by 1000 and recording the 
coefficients of powers of t to the nearest integer, we obtain 

a;0123 4 5 6 7 89 10 ^^ 

N4,l 10 44 117 206 246 205 117 44 10 1 1000 

From Charlier’s data we find = 4*933, mg = 2 •415. 
The theoretical expectations are — np — 6^00 and 
jLLg = npq — 2-5. 

We shall consider in a later section (55) whether these 
deviations of actual experimental results from theoretical 
expectation are reasonable under the hypothesis of random 
sampling. 

23. The Binomial Distribution of Poisson. The 

ordinary binomial distribution is often called the 
Bernoulhan distribution, after James Bernoulli, who first 
(in Ars Conjectandi, a work published in 1713, eight years 
after his death) investigated it in detail. S. D. Poisson 
in 1837 considered the problem of n trials, but with the 
system S varied each time so as to produce possibly 
(Afferent probabilities of success where j = 1, 2, ..., 7 ^. 
The g.f. is therefore 

iPxi+Qi){P2i+az) — (PJ+^n)> • . ( 1 ) 
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and so the f.m.g.f. is 

(1 +^ia.){14-i52a) ... (l+5?n^)- • • ■ (-) 

The coefficient of a in this f.m.g.f. gives us the mean, or 
mathematical expectation of the number of successes, as 
Let us write 

~ • • (^) 

in order that we may later compare the moments with 
those of a BernouUian distribution with the same mean 
probability and so characterized by the g.f. 

The Poisson f.m.g.f. about the mean is (compare the details 
of 22 (2)) 

iT(l -{-a)'-Pj(l+Pja) (U = product) 

= / 7[1 

= l+2piqja^l2[—2I!p£j{Pj+l)a^jSl+... . (4) 

Hence /t( 2 , = Zp^q^, and /ijj, = —2Zp,qj(pj+l), 

BO that =/i(3)+3/A(2) • - (5) 

24. Comparison of BernouUian and Poissonian 
Variance. It will now be proved that the Poissonian 
variance, let us say cr|, is less than the BernouUian, cr|. 
At first sight this may seem surprising, for one might 
imagine that the variation of probability of success in 
trials within the experiment would increase the variance 
of X, the number of successes. If we consider, however, 
the case of extreme variation of probability, namely the 
case in which some of the trials are certain of success, 
and the rest are certain of failure, we shall see that the 
smaller variance is natural enough ; for in this extreme 
instance the value of x is constant and so its variance is 
zero. 

The fact that the Poissonian variance is less than the 
BernouUian is valuable, for it suggests a test for the 
constancy or otherwise of the system 8 from one trial 



52 


SPECIAL PROBABILITY DISTRIBUTIONS 


to the next, in other words, for statistical homogeneity 
within the experiment. 

As in 23, let p be the mean probabihty, p ~ Sp^jn. 
We have at once, by the usual transference to the mean, 

c^ = Z{p^~!pfln = Sp^ln — . . ( 1 ) 

where cr| is the variance of probability in the n trials. 
Hence 

== np —np ^ —S{p^ —p) 2 
= npq-S(Pi-p)^ . . . ( 2 ) 

that is, °p— • • • • • (3) 

This result shows not only that the Poissonian variance 
is less than the BernouUian, but by how much it is less. 

25. The Lexian Distribution. The extension made 
by Poisson to the BernouUian scheme consisted in varying 
the probabihty of success among the n trials, but within 
the experiment. A different kind of extension was con- 
sidered by the German economist, W. Lexis, in 1877. 
The probability was taken by Lexis as constant in the 
n trials of one experiment, but as varying among h such 
experiments. 

Let Tc BernouUian sets of n repeated trials be made, 
each with constant probabihty of success within the set. 
Let Pi be the probabihty for the set, where i — 

2, ..., h, and let Xi be the number of successes recorded 
in it. It is required to find the mean and variance of 
the distribution of the Xi. 

The sets are here mutually exclusive, and the probabihty 
of each, if we imagine one of the Pi to be chosen and n 
trials to be then made, is Ijk, Also the f.m.g.f. of Xi is 
(l+p^a)”. Thus the f.m.g.f. of the Lexian distribution is 

h-^Z{l+p,^f, ( 1 ) 

i 

The coefficient of a shows that the mean is nUpijJc. For 
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comparison with a repeated Bemoullian scheme let us put 
np = nEpijk. The f.m.g.f. about the mean is then 

/c“i(l+a)-”^Z'(l+^,a)« . . (2) 

= [l—npa+np{np+l)a^l2\+...] 

yll+n:pa+^^^^Z2>y/2l+...] 

whence, by picking out the coeJfficient of a^/2!, 

^^'2 ~ /^(2) 

= np{np-]-l)—2n^p^ -\-n (n — 1 ) [kp ^ ~\-Z(Pi —p) 

= npq+n{n-l)E{pi—p)^lk, 
that is, 

o| = Mpg'4-»(«-l)cj|. . . . (3) 

Thus, whereas the Poissonian variance was less than 
the Bemoullian, we see that the Lexian variance exceeds 
the Bemoullian by an amount which increases strongly 
with Uj because of the coefficient n{n—l) in (3). 

26. Coolidge’s Extension of the Lexian Scheme. 

It is a natural extension to consider, as J. L. Coohdge 
did in 1921, the distribution which arises not from k 
Bemoullian but from k Poissonian sets, each with a different 
set of probabiKties in its constituent n trials. 

Let Pij be the probabihty of success in the trial of 
the set. Then, just as in 25, the f.m.g.f. is 

k-^En(l+p,fi), . . . (1) 

i j 

Let us write Sp^j == np^Q, ^Pio = Then the mean of 

3 i 

the distribution is evidently np. Transferring the f.m.g.f. 
to the mean, and picking out the coefficient of a“/2!, 
we find, after three or four lines of algebra, 

Fa = F(2) = 

i i j 



54 


SPECIAL PROBABILITY DISTRIBUTIONS 


It is appropriate to regard the three terms of this 
expression as of Eernoullian, Lexian and Poissonian type 
respectively: Certain special cases are easily perceived ; 
for example, when — p, that is to say, when the mean 
probability in each set of trials is the same for all sets, 
a variance emerges which slightly generalises the Poissonian 
variance and, like it, is less than the Bernonllian. 

An alternative form is 

i i 3 

which we may write as 

0%= ■ ■ • ( 2 ) 

This result shows that non-homogeneity, or fluctuation 
of probability, within the trials of an experiment is of far 
less effect, when n is large, than fluctuation in mean 
probability from one set to another. In fact in many 
cases (j| differs only slightly from the corresponding 

Analysis of Variance. The results which we have 
obtained for the Lexian and Coolidge schemes exhibit the 
variance as resolved into separate components of variance. 
The Eernoullian component may be called the random 
component, since it arises even when probability is con- 
stant, while the Lexian component may be called the 
'^systematic component, since it arises from the systematic 
alteration or variation of probability from one experiment 
to another. This resolution of variance into separate 
components of variance has been called analysis of variance. 
It has been greatly extended by Professor R. A. Fisher, who 
has devised regular schemes of experimental arrangement 
involving many variates, by means of which not one but 
several systematic components of variance can be isolated 
(75) from each other and from the random component.' 

27. Gharlier's Criteria of Homogeneity Based on 
Dispersion. The test of homogeneity or stability con- 
sidered in this section would now be superseded or 
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amplified by modern methods of analysis of variance, 
hut it is interesting in itself. 

We have approximately, in the Lexian and Coolidge 
schemes, 

4 = • • ( 1 ) 

Hence 

where np, the mean of the distribution. 

Hence ‘^Ji>=V(o1—°b)Ii^v ■ ■ (3) 

Charlier denoted this by p, naming it the coefl3.cient of 
perturbation ’’ of a Lexian distribution. He turned it 
into a percentage by taking lOOp. From (3) we see that 
p measures the relative fluctuation of probabiHty . 

Example. Classing 288,000 Swedish births in 576 sets 
of 500 each, according to different months and different 
districts, Charlier found for x, the number of male births 
in a set, 

7^2 = 257-12, Sj. = 12-49, n = 500, k = 576. 

Hence p =«■ m^ln = 0-514, 5 = 0-486, not d priori, but as 
estimated from the large sample of 288,000 ; and so 

= ^{npq) = V124-9 = 11-18. 

Hence lOOp = 100(156-0-124-9)/257 = 2-17 per cent. 

The conclusion made is that a male birth in Sweden is an 
event of 51-4 per cent, probability, with a standard de\uation 
of 51-4x0-0217, or about 1-1 per cent, probability. 

28. Types of Multinomial Distribution. The bino- 
mial distribution, of BernouUian or Poissonian type, is a 
special case of the multinomial distribution, the forms of 
which are so many and so various as almost to defeat 
classification. We have seen a simple example in the 
probability distribution of totals of points in n throws of a 
die, or a single throw of n similar dice. Here the g.f., for 
biassed dice, is 

■ ■ ( 1 ) 

and it is best to leave the distribution in this s3mibolized 
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form, and not to expand by the multinomial theorem. 
The generalization to the case of n different dice, possibly 
with different numbers of faces, is easily seen. 

Ex. 1. Prove that the mean value of the total in n throws 
of a biassed die is (pi -p 2 "h 4^4 +“ ) • 

Ex. 2. Find, by constructing the f.m.g.f., the variance 
and standard deviation of the total of points in a throw of 
n symmetrical six-sided dice. 

The f.m.g.f. reduces to 



Hence == 35}^/12, and so cr == v'{35n,)/2 V3. 

29. Sampling without Replacement, Hypergeo- 
metric Distribution. When in sampling a population the 
individual drawn is not replaced, the result of one drawing 
influences the probability of the next, so that the successive 
drawings are not independent. Hence it is no longer 
possible to combine into a product the g.f.^s of the separate 
drawings. It is true that the difdculty can be circum- 
vented by the introduction of symbolic products, with 
due precautions in expansion, but we shall here proceed 
ffom first principles. 

Let us consider a population of N individuals, of whom 
M — Njp are of character A, so that the probability of 
drawing an A at the first drawing is p. Let n drawings 
be made, no individual drawn being replaced after the 
drawing. It is required to find the probability distribution 
of X, the number of individuals A drawn. 

The probability of x successes A, n~x failures A, 
occurring in some particular order, is 

... (M~x+l){N-M){N~-M-l) ... 

{N^M-^n-\-x+l)IN{N^l) ... (N~n+l), . (1) 
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as is readily seen by considering bow tbe numbers in 
population, and in categories A or A, are depleted by 1 
at each drawing. But there are n^x) possible orders in 
which X successes may eventuate among n drawings. 
Hence the desired probability is 

. ( 2 ) 

where = 3I{M—l){M—2) ... {M—x+l), and so on. 

Just as the binomial probability function of 22 was a 
typical term in the binomial expansion of so this 

function that we have just found is a typical term in a 
certain series, a hy^ergeometric series. (The series is 
however of higher type than the ordinary Gaussian hyper- 
geometric series.) Hence ^{x) is often called the hyper- 
geometric probability function. 

The g.f. is 

S ( 3 ) 

and so the f.m.g.f. is 


Z ( 4 ) 

x—Q 


which may be evaluated (with some trouble if only 
elementary methods are used) as 


1 , , 

l+-^a+ 


2)^(2) 
IvTiT" ' 




jVT(3) 


a3/31- 


(5) 


a terminating hypergeometric series which in the notation 
of Gauss would be written F{—3I, —n ; —N ; a). The 
mean is thus AlnjN, and the factorial moment is 

The examples which have now been given of probability 
distributions have shown how numerous and varied are 
the types of distribution. In fact, any proposed probabihty 
function may be simulated by a suitably constructed model 
or population, and special samplings of this population 
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give rise to furtlier probability functions. Tortunately, 
when the number n of trials is large, many of these pro- 
bability distributions tend with good approximation 
towards one or other of a few dominant types, which we 
shall now consider. 

30. Important Approximate Distributions : Types 
A and B. When the coefficients of in the Bernoullian 
binomial g.f. (p^+2')” are taken as probability ordinates 
y = we may join the tops of the ordinates to form 
a probability polygon. If this is done for increasing 
values of n, the mean being taken as origin and the 
standard deviation V(^M) of scale, it is found 

that the successive probability polygons tend to lose any 
initial asymmetry due to inequality of p and q. 

In fact the coefficient of skewness is 

A = hIiA = 

= {Q-P)^lnpq, . . . ( 1 ) 

which evidently tends to zero as n increases, unless either 
of p OT q is of the order of magnitude of I/tz, let us say 
0 {ll 7 i), in which case the skevmess remains appreciable. 
Not only so but, apart from the exception just mentioned, 
these binomial curves are found to cluster towards a 
limiting symmetrical shape, the same for all. The curve 
to which they thus approach asymptotically is of paramount 
importance in statistics, and is called the normal probability 
curve. It is the asymptotic shape not merely of the 
Bernoullian binomial but of the Poissonian, as weU as of 
the multinomial and of many other distributions, and it is 
characterized by the probability differential 

dp — 6{x)dx = — \z= . . (2) 

CrV27T 

where p is the mean, a the standard deviation. 

When small corrective terms involving n are retained, 
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a closer representation is given by the probability function 
of Type A, namely 

p{x) = (3) 

where i^{x) denotes the normal probability function, and 
the coefficients of the derivatives are of 

irregularly decreasing orders of magnitude with respect to 
n. The coefficient measures skewness, measures 
excess. 

As noted above, the case when p is very small is 
exceptional. If p is 0{ljn), the mean np is not 0{n) 
but 0(1). In this case the normal fimction is not the 
most suitable basis of approximation, and the appropriate 
asymptotic probability function is Poisson’s function of 
statistical rareness, namely 

^{x) = (4) 

where is the mean. Here again, when terms of smaller 
order involving n are retained, a closer representation is 
given by the probability function of Type B, namely 

p{x) ^ ^(a;)+62V2^(^)/2!~63vVW/3I+.*., (5) 

where ifj{x) is Poisson’s function (4) above, and V denotes 
the operation of forming the receding difference, so that 
\/ijj[x) = iIj{z)—iIj{x—1), It proves to be the case that 
62 is 63 and 64 are 0{n~^), and 6g are 

and so on. 

We now consider the derivation of these fimctions. 

31. The Normal Function as Limit of the Binomial. 

The rigorous derivation of the normal function as generated 
by compounding n independent distributions, and the 
discussion of necessary and sufficient conditions, require 
advanced mathematics beyond our scope. We content 
ourselves here with elementary and incomplete treatments. 

Consider first the binomial g.f. {pt+q)^, where p is 
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not of order l/?^, but is 0(1). Putting ^ we ha>ve 
the m.g.f. 

= {l-^pa+:pa^l2l+pa^lS\+)^ ( 1 ) 

The mean is np. Let us transfer to the mean, and to 
discover the limiting shape of the curve of probability 
let us alter the scale, so as to find the distribution, not of 
actual number of successes Xy but of the deviation 
(x—np)ln of the relative frequency of successes from the 
mean p of relative frequency. 

As a first step we construct the m.g.f. of xjn. By 
11 it is 

[1 +paln+pa^l2[n^+0{;7i~^)]'^ 

== [(l+pa/w-+Jp^a^/^~)(l "hiljP — p^)a^/n^~]r0{7i~^))Y'f (2) 

where 0(7i”^) indicates in both cases remainder terms of 
order n-^. As n increases this m.g.f. tends asymptotically to 

(3) 

The first factor shows that the mean of the transformed 
variate is p ; but this we already know. The second 
factor indicates, by a further obvious transformation of 
scale, that the m.g.f. of the standardized deviation 
z = {x—np)l\/{npq) is 

( 4 ) 

Now the possible number x of successes may range from 
0 to n. Thus the values of z may range from — \/{nplq) 
to +'\/{'nqlp), a range which tends in both directions to 
infinity. Purther, consecutive values of x differ by 1, 
and so consecutive values of z differ by Ij^s/inpq), an 
interval which tends to zero as n increases. We therefore 
seek a representation of the probability function (l>{z) as 
a positive function continuous over the range — oo to oo ; 
and the question is, what function (f)[z) is such that its 
m.g.f. 


/*00 

^(z)e®£iz = ? 

J —00 


( 5 ) 
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The answer is contained in a theorem, to the effect that 
the only positive continuous function satisf 3 "ing this 
relation for some continuous range of values of a is 

(6) 

V 27r 

and this is the normal probability function, in standard 
form. 

The reader should become thoroughly familiar both with 
this form and with the unstandardized form of 30 (2). 

Incidentally, taking the logarithm of the m.g.f. (4), 
we see that apart from the mean or first seminvariant 
there is only one other seminvariant, namely A 2 or a 


. 32- Properties of tbe Normal Probability Function - 
The curve of the normal function is a symmetrical bell- 
shaped curve, extending to infinity on either side and 
flattening rapidly upon the axis of x. 



The maximum ordinate is yQ = 
under the curve is 


1 

V27T 



e~^^*dz = 


1/V(27r). 

1 , 


The area 


( 1 ) 


by the well-known integral. (Gillespie, Integration, p. SS.) 
The pomts of inflexion, given by d-y!dz^ — 0, will be 
found to be at z = ±1, or, in unstandardized units, at 
deviations from the centre. 

The probabihty, as taken from the normal curve, that 
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a deviation from the mean is numerically less than z is 
the area under the curve between the ordinates for —z 
and + 2 ;, namely 


h r e-^z. 
'V'Itt J 


( 2 ) 


This function, called the error function or probability 
integral, is denoted by erf( 2 ;) and has been extensively tabu- 
lated. (It is called the error function because the typical 
distribution of errors committed by instruments of observa- 
tion has been found to be sensibly normal.) The following 
short table shows how the probability of deviations outside 
the range ( —z, z) diminishes as z increases : 

2 0 0-5 1 0 1-5 20 2-5 3 0 3-5 4-0 

erf( 2 ) 0 0-383 0-683 0-866 0-954 0-98g 0-997 0-9996* 0-99994 

We may note that the probability of a deviation 
greater than a is about 1/3 or more nearly 7/22 ; that 
of one greater than 2a is about 1/20 or more nearly 1/22 ; 
that of one greater than 3cr is about 1/370 ; and that of 
one greater than 4cr is about 1/17000. 

The quartile deviation or so-called probable error ’’ 
is given by 


By interpolation it is found to be = 0*6745 nearly, 
corresponding to a deviation from the mean of about 2/3 
or more nearly 27/40 of the standard deviation. 

The mean absolute deviation is given by 


2 

ze~^^'dz = \/{2l'Tr) = 0-7979 nearly, 

V27T J 0 


( 4 ) 


corresponding to about 4/5 of the standard deviation. 

The higher moments of the normal function are found 
by expanding the m.g.f. exp(Ja^), or the unstandardized 
exp(Ja% 2 j^ 3^3nLd observing the coefficients of d^JrL For 
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odd orders they vanish, for even orders 2r they are given by 


In particular 


/^2r == iiYcr^^2r)llr\ 
/X 4 = 3(7^ 


. ( 5 ) 


so that (17) the coefficient of excess jSg = = 3. 


33. Poissonian Function of Rare Statistical 
Frequency. We return to the binomial g.f. (pt+qY, 
examining the previously, excepted case in which, though 
n becomes large, p is so small that the mean wp is 0(1) ; 
in fact p = 0(?^"^). Writing the mean np as p, we have 
p = pjn. The f.m.g.f. is therefore (22) 

(l+/xa/7i)", which tends to • • (1) 

as n increases. This is the f.m.g.f. of the Poissonian 
function. The probability g.f. is therefore 

• . . . ( 2 ) 

and the coefficient of in this gives the desired probability 
function as 

iIj(x) = . (3) 

34. Properties of the Poissonian Function. The 

normal function contains two parameters, the mean p 
and the standard deviation or. The Poissonian function 
has one parameter only, the mean p. The range of the 
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function is from a; == 0 to a; == oo. For im<1 the pro- 
bability polygon is J-shaped, for it becomes double- 
sided and for large values of [i tends to acquire symmetry. 
Indeed, for large values of ju the shape is approximately 
normal ; for the ordinary m.g.f. is 

exp[ja(e®— 1)] = exp{/ra+]xa^/21+jLLa3/31 + ...) (1) 

and if we change the scale so as to make ^/ll the unit we 
obtain the g.f. 

exp(/x*a+ctV^l+a.^/3!jL6^+...)3 (2) 

which, to a first approximation (that is, including the 
first two terms of the series in the bracket) is the m.g.f. 
of a normal function with mean -y/p and unit standard 
deviation. 

The logarithm of (1) gives the seminvariant g.f. of the 
Poissonian function as 1), which shows that all the 

seminvariants are equal to the mean /x ; in particular 
the variance Ag or is equal to fx. 

There is only one factorial seminvariant, A(i) = fx. 

35. More General Derivations ; T 3 npes A and B. 
As before, the extensions of the domain of application of 
the fundamental distributions given below are not 
established under the widest conditions. 

Let us consider the compounding of n systems 
where J == 1, 2, ..., 7^, where each system has finite semin- 
variants, all 0(1). The seminvariant g.f. of Sj is then 
a convergent series 

i,(a) = Aiia+A(%2/2!H-A<ia^/3! + ... (1) 

For example, the binomial distribution of n throws of a 
coin, provided that p is 0(1) and not may be proved 

to have an s.g.f. of this kind. 

Now imagine all the n systems 8j> to operate independ- 
ently, the results being added to make a variate x. By 
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the additive property of seminvariants the serainvariant 
g.f. of X is 

i;(A(^^*a+A^V/2!+A^%^/3! +...), • . (2) 

3 

and the s.g.f* about the mean of x is the same with the 
term in a removed. The second seminvariant of x is 
clearly 0(n), and so the standard deviation is 0(7i^). Let 
us therefore alter the scale so that xj^/n becomes the 
variate. The s.g.f. of this variate is then 

, . (3) 

where X^ is X^ is 0(1), A3 is 0 (n“i) and in general 

Xf is Again, the s.g.f. about the mean is the 

same with the term in a removed. 

Thus as n increases the dominant term in the s.g.f. 
about the mean is which is the s.g.f. of a normal 

function 

If, however, we retain the terms of smaller order, while 
choosing the scale so that A 3 — 1 , the m.g.f. about the 
mean is 

if (a) = exp(Ja 2 )exp(A 3 A^ta ®/3 \+X^X 2 ^a^l^^-+--) * 

= exp(ia2){l+a3a3/3!+a4aV4I+...) . . (5) 

where the second factor in brackets on the right arises 
from the expansion of the second exponential in the first 
hne. Now if a probability function P{x}, which vanishes 
with all its derivatives at the boundaries, has m.g.f. 

M{a) = ^^P(x)e0^dx . . . (6) 

it may be proved by r integrations by parts that 

a''M{a) = {-Y^'j^P{x)e<^Mx. . . (7) 
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Thus here, reverting the m.g.f. of (5) term by term, we 
derive the corresponding probabihty function as 

p(z) = ^(2;)--a3«;i'''(2:)/3!+a4^^^(2;)/4!— (8) 

provided that the series for m.g.f- and probability function 
are convergent. This is the probability function of Type A. 

A close examination of the magnitude of terms in the 
expansion of 

exp(AsXfiay3l+\?if^a^/4!+..,) . . (9) 

shows that the order of magnitude of coefficients in the 
series of Type A is as follows ; 

ag = and = 0(n-^), ctg, and 

and later coefficients show a similar irregularity. 

Here let us pause to point out a practical disadvantage 
of the representation by T3rpe A. If we are representing 
a given frequency distribution by T3q)e A, we must use 
the observed moments to estimate the coefficients ag, a 4, ... 
in Type A. Let us suppose that the convergence demands 
the retention of terms up to We must then include 

not only but also a^. How depends on the 6th 
moment, and the 6th moment of the observations is subject 
to very high sampling error (68). Hence the effort to 
increase mathematical accuracy by retention of higher 
terms is largely frustrated by the statistical inaccuracy 
of the moments used to estimate those terms. 

Series of Type B. The procedure for deriving the 
function of Type B is rather similar. The f.m.g.f. proves 
to be 

exp(/za)(l+62a2/2!+63a®/3!+...), . . (10) 

which on reversion term by term gives 

p(x) = <A(*)+6aVV(a:)/2!-f'3V®'/'(a:)/3!+..., (11) 

the series of Type B, where ili{z) denotes Poisson’s function 



PEARSONIAN SKEW CURVES 


67 


of 33 (3). Here the order of magnitude of coefficients is 
found to be : 

^2 = 63 and 65 and 6 g = 0{n-^), 

and so on. Thus in using the function of Type B for the 
representation of a frequency distribution it is best to 
truncate the series after a difference of even order. 


36. Other Systems of Probability Functions : the 
System of Pearson. We have seen hoTv the functions of 
Types A and B arise by the addition of seminvariant 
(or factorial seminvariant) generating functions, corre- 
sponding to the compounding of values of an additive 
variate. But a variate of this kind is a very special 
one. For example, if x is built up of added increments, 
then a; which we might have occasion to use instead 
of a;, is certainly not the sum of the squares of those 
increments. Indeed, as we may well anticipate, the 
distribution of a; ^ is different from that of x. 

For this and for other reasons the scope of typical 
probabihty functions has been widened, and systems 
other than Type A and Type B have found acceptance. 
One such system is the system introduced in 1895 by 
Karl Pearson. 

Let us consider the difference or differential equations 
satisfied by some of the standard probability functions. 
We shall use the receding difference operation defined by 
\7<^{z) = 

(i) The binomial probability function of 22 ( 1 ) satisfies 


\J^X) = 


^{n-x+l) 


^{x). 


( 1 ) 


(ii) The Poissonian function ^(o:) of 30 (4) satisfies 


X — n 


4,{x). 


V^(a:) = 


(2) 
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(iii) The hypergeometrio probability function of 29 (2) 




{M—x+l){n—x-\-l) 


<l>{x). 


( 3 ) 


(iv) The normal probability function in standard form 
31 (6) satisfies 

±4,(x)=-x4>(x). . . . (4) 


A number of other probability functions, arising 

naturally in problems of repeated trials, might be added 
to this list. The Pearsonian system consists of the 

functions ^(a;) '^hich satisfy the differential equation 

^ {x-a)y 

dx Co+c^cc+CgOJ^ ' ^ 


The functions are found by immediate integration ; 
thus 


log2^ = 


1 : 


{x—a)dx 


. ( 6 ) 


whence y can be fotmd by the methods of elementary 
integral calculus. The quadratic in the denominator of 
the integrand may have real, variously positive or negative, 
or equal, or numerically equal but of opposite sign, or 
complex roots ; or again, with Cg = 0, may degenerate 
into a linear function, or with and Cg = 0 into a constant. 
These various cases yield the Pearsonian curves, usually 
classified into twelve types ; while the discriminant of 
the quadratic, expressed in terms of moments of the 
curves, yields a ‘‘ criterion ” for judging in advance what 
type is appropriate to a proposed frequency distribution, 

A full account of the curves, their shape and the process 
of representing frequency data by them is given in 
Elderton’s Frequency Curves and Correlation (3rd edition, 
London, 1938), to which we refer the reader for details. 
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Here we have space to mention from time to time only 
a few of the curves, as they occur in special problems. 

37. Probability Functions Generated by Change 
of Variate. If x is distributed about the mean aj = 0 
in a normal distribution 

= {2‘7T)~^e~^^*dx, . . • ( 1 ) 

it is certainly not the case that is normally distributed ; 
for putting z = we have dx = {2z)''^dz, and so 

dp = 7T''^z~^e~^dz. . . * (-) 

The range of z is from 0 to oo, and the constant 7 t“^ is 
such that the integral of the probability function of z 
over this range is 1. The distribution of z is skew, and 
is actually a case of Pearson’s Type HI. 

Ex. 1. Prove that the m.g.f. of z is (1 — a) “K 

Again, if x is distributed between and | in the 
rectangular distribution dp = dx, the cube root 2 == 
is distributed, as the reader should verify, in the U-shaped 
distribution dp = ZzHz. Or again, to take an example 
from physics, if the distribution of the velocities of a 
great number of particles about a zero mean velocity 
were normal, the distribution of their energies would be 
of Type III. 

The derivation of probability functions from the 
normal function by non-hnear change of variate was 
emphasized by J. C. Kapteyn in 1903 [Shew Curves in 
Biology and Statistics, Groningen), but was by no means 
a new conception even at that time. 

Ex. 2- If a is a normal variate in standard measure, we 
have seen in Ex. 1 that the m.g.f. of z = is (1— •a)“^. 
Hence the m.g.f. of ... where the are in- 

dependent normal variates with the same mean cc == 0 and 
in standard measure, is (1— The probability function 
which has this m.g.f. is unique, and of the form 
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The reader should verify that this function actually has the 
above m.g.f., and should find by integration the value of c. 

Ex. 3. If X is distributed normally about a; = 0 as mean, 
find the distributions of : (i) = e®, (ii) z ~ a;®, (iii) z ~ x^. 

38. Cauchy's Probability Function. The pro- 
bability function which we shall next consider arises by 
change of variate in a rectangular distribution. Let us 
take a point Q on the axis of y at unit distance from the 
origin 0. Let a straight line be taken at angle 9 to QO, 
all values of 9 from — ^ to ^ being equally likely, to 
cut the X axis in the point X = {x, 0). T^at is the 
probability distribution of a; ? 

The distribution of 9 is rectangular, dp = 7T~~^d9. 
Also X = tan 9, so that 9 = arctan x, d9 = 

Hence the distribution of x is given by 

. 1 dx 

dp=:- r, range — oo to oo. 

7rl+a;2 

The probability function appearing here is Cauchy’s 
probability function. It has the property (very awkward 
for any theory of estimation from sample based on 
moments) that its moments of even order /Xg? i ^45 ••• 
aU infinite. The reader should verify this by integration. 
It follows at once that linear compounding of independent 
variates obeying laws of Cauchy type cannot be carried 
out by the addition of seminvariants ; in fact the semin- . 
variant g.f.’s do not converge. This exception to the 
common rule gives us a salutary reminder that linear 
compounding of independent variates does not necessarily 
generate a distribution of normal type. 

The Cauchy curve has been found to possess a specially 
remarkable property. If n independent variates obe 3 dng 
the same Cauchy law are added, and the mean is taken, 
this mean obeys exactly the same law. Not only so, but 
the distribution of any Hnear combination 
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of variates Xj obeying the same Cauchy law, where the 
Cj are positive and sum to 1, is again exactly the same 
Cauchy distribution. 



The figure shows the normal curve in standard measure 
and the flatter Cauchy curve drawn to the same scale. 

39. The Pearson Curve of Type I. As a final 
example of a probability function arising from a particular 
problem, let us consider the following : 

Suppose that x is distributed in the rectangular 
distribution over the range 0 to L Let ti+I points 
be taken independently in this range. What is the 
probability that the (^+1)** point of these, as counted 
from the left of the range, is in the elementary interval 
x—^dx to x-\-^dx ? 

The probabihty is compound ; it is the probability 
that one, any one, of the n+1 points is in the interval, 
and that h of the remaining n are in the range 0 to x—^dx 
while n—h are in the range x-^-^dx to 1. Hence the 
compoimd probability is 

dp = (f>(x)dx = n^Jc)(n+l)x^{l—x)^-Hx, (1) 

for the first probability mentioned is and the 

second is The probability function (p{x) 

obtained here is of Pearson’s Tjpe I. It is in fact the 
integrand of the Beta function (Gillespie, p. 84), 
apart from the factor 7i(j.)(7i+l) which ensures that the 
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axea under tlie curve is 1. Had the range been a to 6, 
we should have obtained 

W = »(« (x-aY(h-xY-K (2) 

The probability integral of the simpler form (1) over 
the partial range (0, x) is called the Imm'pkU Beta 
function,. In the same way the integral over (0, a:) of 
the function 

• • - .( 3 ) 

which is a case of Pearson’s Type HI, is called the 
Incomplete Oamrm fwndion. 

Variety of Probability Curves, The preceding 
survey of types of probability function, though far from 
exhaustive, will have served to dispel the idea, once 
rather prevalent, that normality and symmetry were the 
rule and that skewness was an accident of sampling. The 
r61e of the normal distribution iu statistics is not unlike 
that of the straight line in geometry; and we do not 
force curves into the mould of the straight line. Skew 
distributions are in fact the predominant type, for skew- 
ness arises from Lexian variability or non-homogeneity, 
from Poissonian statistical rarity, from limitation in the 
number of causes of variation, and from non-linear 
transformations of the scale. 
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PRACTICAL CURVE-FITTING WITH 
STANDARD CURVES 

40, Representation of Frequency Data by Normal 
Curve. Tke present chapter will be devoted to the 
numerical details of representing frequency distributions 
by normal curves, curves of Type A, Poissonian curves 
and curves of Type B. 

In fitting the normal curve, that is, in finding the 
equation of the normal function of best approximation 
to the given frequency distribution, the idea is to represent 
the relative class frequencies by the corresponding segments 
of area under the normal curve between neighbouring 
ordinates corresponding to consecutive class boundaries. 
The mean or m of the frequency distribution is taken 
as the estimate of the mean or of the normal function ; 
the second moment or corrected for grouping if 
necessary by Sheppard^s correction, is taken as the 
estimate of the corresponding or a^. In order to use 
the standardized tables of the normal probability integral 
it is best, once and have been computed, to 
standardize the class boundaries, taking them as deviations 
from the mean, in units of 5. The values of the probability 
integral corresponding to these class boundaries are then 
read from tables (Appendix 4) ; the first differences of these 
values are the estimates of the class probabilities ; and 
finally we may multiply by n, the total number in sample, 
to make comparison with the absolute class frequencies. 

Example. In the data of heights of Irishmen ( 18 , Ex.) 
the mean is 67*34, and with Sheppard’s correction is 
4.705-0*083 = 4*622. Hence s = 2*15, l/s = 0*465. The 
standardized deviations of class boundaries are shown in the 
column z — {'X—m±^)js below. Since their common differ- 
73 
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ence is l/s or 0*465, they are readily found, when once any 
one of them has been computed, by repeated addition or 
subtraction of 0*465, and the results can be checked at the 
ends of the range. The next column shows the values of 
erf {z), the next the first differences of these, the next the 
same multiplied by 346, and the jQnal column the original 
class frequencies themselves for comparison. 


X 

z = (x—m±i)/s 

1 erf 2 ; 

JAerfg: JnA.erfa; 

obs. 


— 00 

— u-ouuu 




59 

-3*646 

-0*4999 

0*0001 

0 

1 

60 

-3*181 

-0*4993 

0*0006 

0 

0 

61 

-2*716 

-0*4967 

0*0026 

1 

2 

62 

-2*251 

-0*4878 

0*0089 

3 

2 

63 

-1*786 

-0*4629 

0*0249 

9 

7 

64 

-1*321 

-0*4068 

0*0561 

19 

15 

65 

-0*856 

-0*3040 

0*1028 

36 

33 

66 

-0*391 

—0*1521 

0*1519 

53 

58 

67 

0*074 

0*0295 

0*1816 

63 

73 

68 

0*539 

0*2051 

0*1756 

61 

62 

69 

1*004 

0*3423 

0*1372 

47 

40 

70 

1*469 

0*4291 

0*0868 

30 

25 

71 

1*934 

0*4734 

0*0443 

15 

15 

72 

2*399 

0*4918 

0*0184 

6 

10 

73 

00 

0-5000 

0*0082 

3 

3 





346 

346 
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41. Representation by Type A. The coefficients 
in the series 35 (8) of T3rpe A can be expre^ed 
in terms of the moments about the mean. For by 35 (5) 
the m.g.f. (in unstandardized scale) is given by 

1 +/>C2a2/2!+/i3a3/3!+/Z4aV4!4-... 

= exp{Jcr%^)(l+a30-^a^/31+a4or^a^/4!+...). (1) 

Multiply each of these expressions by exp(— 
and expand the product in the former case. Equating 
coefficients of a^/r !, we have the desired relations 

“ (/^5 

. . ( 2 ) 

and so on. 

The routine for fitting Type A is a slight extension 
of that used in fitting the normal curve. Moments about 
the mean are computed and if necessary corrected by 
Sheppard’s corrections. The coefficients a^, ... are 
estimated from these moments by the formulae just given, 
with nif substituted for fjij.* The integral of the corre- 
sponding Type A series is then taken instead of the normal 
probability integral. This involves the necessity, if terms 
in ^3 and are included, of having supplementary tables 
of the integrals of the functions which appear in these 
terms, that is, tables of 



and ^4(2) 



Such tables have been computed and are available. 
(British Association Tables, 1931 ; Bowley, Elements of 
Statistics, p. 303, F^iz) only.) 

Example. (Bowley, Elements of Statistics, p. 309.) To 
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fit two terms of a series of Ty^e A to data giving age 
distribution of St Louis school children in the sixth grade. 
(Age X means x to a;+L) 

a; 10 11 12 13 14 15 16 17 18 n 

nf 26 201 673 1001 739 310 80 13 1 3044 

By the usual routine we compute m' = 13*665, mg = 1-498, 
= 0-356. Hence, using Sheppard's corrections, the 
corrected 

= 1*498-0-083 = 1-415, 8 = 1-190, l/s ^ 0-840, 

estimated a-g = = 0-211. 

The rest of the working can be arranged in columns as 
below. 


(1) 

(2) 

(3) 

(4) (5) 

(6) (7) 

(8) 

(9) 

(10) 

X 

z = (x-m)l8 

i erf 2 

F z(z) ttiF j(2) 

(8)+(5) A 

nA 

obs. 

normal 

10 

~oo 

-0-5000 

-0-0665 -0-0140 

-0-5140 




11 

-2-24 

-0-4875 

-0-0882 -0-0186 

0-0079 

-0-5061 

24 

26 

88 

12 

-1-40 

-0-4192 

-0-0904 -0-0191 

0-0678 

-0-4383 

206 

201 

208 

13 

-0-56 

-0-2123 

-0-0275 -0-0058 

0-2202 

-0-2181 

670 

673 

630 

14 

0-28 

0-1103 

-0-0076 -0-0016 

0-3268 

0-1087 

995 

1001 

982 

15 

1-12 

0-3686 

-0-0755 -0-0159 

0-2440 

0-3527 

743 

739 

786 

16 

1-96 

0-4750 

-0-0942 -0-0199 

0-1024 

0-4551 

812 

310 

324 

17 

2-80 

0-4974 

-0-0755 -0-0159 

0-0264 

0-4815 

80 

80 

68 

18 

3*64 

0-4999 

-0-0672 -0-0142 

0-0042 

0-4857 

13 

13 

8 

19 

00 

0-5000 

-0-0665 -0-0140 

0-0003 

0-4860 

1 

1 

0 






8044 

3044 

3044 


The closeness to the observations is remarkable. Indeed 
the tests of “goodness of fit,” to be developed in 54, show 
that the discrepancies are so small as to be improbable, and 
the representation is unsatisfactory. We have her© a case 
of “ over -fitting.” 

For comparison we have included in a final column the 
results given by the normal curve of best agreement. 

42. Representation by Poissonian Function or 
Type B. The coefficients b^, 63 , ... in the series 36 (11) 
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of T 3 rpe E can be expressed in terms of the factorial 
moments fju, J'or by 35 (10) the f.m.g.f. 

is given by 

= exp(/xa)(l+62a2/2H-63a3/3!+...) . . (1) 

Multiply each of these expressions by exp(— fia) and 
expand the product in the former case. Equating 
coefficients of a^/rl we have the desired relations 

^4 = /^(4)-^fi(3)/^+6/i(2);*^-3M (2) 

and so on. Note that the numerical coefficients are the 
same as occur in 14 (5). 

The procedure of fitting by Type B is therefore to 
compute factorial moments of the data by the summation 
method (Appendix 2) and by substitution in the above 
formulae to estimate the coefficients 63 , 63 , ... of the 
Type B series. Eor the rest of the work we require the 
values of e“^®/a;! and its differences of as many orders 
as may be necessary. 

The value of e"^ can be taken from a table of the 
exponential function. Then 7ie~^ is computed, after 
which each value of can be obtained from 

the preceding value, corresponding to a;—!, by multiplying 
by mjx, most easily done by a calculating machine. The 
subsequent differencings and multiplication by coefficients 
62 and so on can best be followed from the illustrative 
example. 

Example. E. Rutherford and H. Geiger, in 2608 experi- 
ments {Phil, Mag., Ser. 6, 20, 1910, p. 698) on the number x 
of a-particles radiated from a disc in 7-5 seconds, obtained 
the distribution : 

a;01 2 3 4 5 6 789 10 11 12-14 n 

nj 57 203 383 525 532 408 273 139 45 27 10 4 2 2608 
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The summation method for factorial moments gives 
m = 3*870, W2(2) = 14*784, whence the estimate of h^l2\ is 

J(14*784— 3*872) ^ —0*0965. 


The working is set out in columns as below. 


1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

OJ 

ntfs 



(2)+(5) 

obs. Poisson 

0 

54*40 

54*40 

54*40 

-5*25 

49 

57 

54 

1 

210*52 

156*12 

101*72 

-9*82 

201 

203 

211 

2 

407*37 

196*85 

40*73 

-3*93 

403 

383 

407 

3 

525*49 

118*12 

-78*73 

7*60 

533 

525 

526* 

4 

508*43 

-17*06 

-135*18 

13*05 

522 

532 

509* 

5 

393*52 

-114*91 

-97*85 

9*44 

403 

408 

394 

6 

253*81 

-139*71 

-24*80 

2*39 

256 

273 

254 

7 

140*34 

-113*47 

26*24 

-2*53 

138 

139 

140 

8 

67*89 

-72*45 

41*02 

-3*96 

64 

45 

68 

9 

29*18 

-38*71 

33*74 

-3*26 

26 

27 

29 

10 

11*29 

-17*89 

20*82 

-2*01 

9 

10 

11 

11 

3*96 

-7*33 

10*56 

-1*02 

3 

4 

4 

12 

1*28 

-2*68 

4*65 

-0*45 

1 

2 

1 

13 

0*39 

-0*89 

1*79 

-0*17 

0 

0 

0 


2608 2608 2608 

N.B. — (i) In the differencings in columns (3) and (4) \£/(— 1), 
^(—2)... are tacitly taken as zero, (ii) The asterisked entries in 
column (8) have been raised from those in column (2) to make the 
totals of columns (7) and (8) both come to 2608. 

It will appear when we come to consider goodness of fit 
(64) that the repressntation by the Poisson function alone, 
without the term in 63, is satisfactory. 


43. Liinitatioiis on the Use of Moments in Fitting 
Curves. The discussion of the Cauchy distribution in 
38 has shown that moments are by no means always, or 
necessarily, the best parameters to use in representing an 
observed frequency distribution by a probability distribu- 
tion of assigned functional form. It depends entirely on 
the nature of the probability function what parameters 
may be used with adequacy. For example, since the 
mean of any number of observations x, each of which 
obeys the same Cauchy distribution, has exactly the same 
Cauchy distribution as x, it follows that the mean of sample 
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in this case is no more accurate, for the purpose of estimating 
the centroid of the curve, than any single observation ; 
indeed it may be shown that the median is much superior 
for this purpose, while still better parameters can be 
found. Again, for the purpose of estimating the centre 
of an unknown rectangular probability distribution the 
mean of the n sample observations is quite a good 
estimate ; but surprisingly enough, as R. A. Fisher has 
shown, the mean of the two extreme observations alone is 
remarkably better. As a general precept it may be 
stated that for probability curves of shape and properties 
approximating to the normal curve the use of the mean 
and moments of the frequency distribution gives good 
estimates for those parameters in the probability distribu- 
tion ; but for other probabihtv curves better -narameters 
can be found. 



CHAPTEE V 


PROBABILITY AND FREQUENCY IN 
TWO VARIATES 

44. Bivariate Distributions : Correlation and 

Regression. Hitherto we have been' concerned 
exclusively with probability and frequency distributions 
in one variate, that is, with univariate distributions. But 
most of the important and interesting applications of 
statistics involve bivariate, trivariate or multivariate 
distributions. 

Let us consider how a typical bivariate frequency 
distribution may arise. Suppose that 1000 soldiers in a 
regiment are measured in height, x, and in weight, y. 
The measurements provide 1000 paired numbers {Xj, y^), 
which may be plotted as points in a plane. The resulting 
assemblage of points may be caEed the “ dot diagram.” 



Now there may be, and in fact in the case of height 
and weight there is, a tendency for the value of yj to 
80 
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conform in some way to that of the corresponding x,; 
greater height as a rule is associated with greater weight. 
Any such tendency towards a functional relationship, 
obscured by random deviations, will manifest itself in 
the dot diagram by the greater density of the dots along 
a certain locus. This locus is not sharply outlined, but 
its estimation is important, for it is a smudged image 
of a curve which may be fine and clear-cut in the parent 
population of which the observations are a sample. This 
latent curve or functional relation y = F{x) is called a 
regression, the regression of y on z. It will be a matter of 
judgement what functional basis is chosen for its mathe- 
matical representation. Usually the representation is a 
linear one based on a set of prescribed functions pi(x), 
5^2 (^)> •••3 ^tie regression therefore appearing in the 
form 

y == aQ+aLp^(z)+a2P2(x)+..., . . ( 1 ) 

to as many terms as are judged adequate. The statistical 
problem is then to determine the best estimates of Oq, 

... from the n paired observations y^-). The 
functions Pi[x) are commonly polynomial or harmonic 
functions, but they may be of any preassigned functional 
type. 

The diagram of dots suggests a second point of 
view. The proportion of dots in an elementary region 
z—^Ax<x<z-{-^Ax, y—iAy<y<y~{-i Ay gives an element 
of bivariate relative frequency which corresponds to a 
bivariate differential element of probability, let us say 
dp — <j>{x, y)dxdy, in the parent population. 

We may imagine that on each class-rectangle of the 
network of rectangles delimited by class boundaries of 
X and y a right prism is erected, of volume proportional 
to the corresponding class frequency. The tops of these 
prisms make a surface of flat terraces which we may call 
the prismogram, the analogue in three dimensions of the 
histogram. This prismogram, then, is the rough sampling 

F 
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approximation to an ideal probability surface z = cf}{x, y), 
wMch is often called the correlation surface. 

The functional dependence of on a; may be investigated 
either by the method of correlation, which consists in 
estimating the parameters of the bivariate probability 
function y), or by the method of regression, which 
consists in estimating the coefficients in the regression 
function (1). Naturally the methods overlap to a certain 
extent. In the case of several important correlation 
(bivariate probability) functions the corresponding regres- 
sion curves are straight lines. 

46. Binomial and Hypergeometric Correlation. 

The natural extension of the twofold division, success and 
failure of an event JE, which gives rise to the binomial or 
hypergeometric distribution in one variate, is an arrange- 
ment giving a twofold division in each of two events 
E and F. Such an arrangement is expressed by the 
fourfold table, as follows : 

Let the probabilities of the double events {E,F), {E,F), 
{E, F) and {E, F) be Poo* These are set 

out as shown in the fourfold table, the columns referring 

t 1 

f 


F 


to E and E, the rows to F and F, The sum Pn+Pio, 
representing the total probability of E whether F occurs 
or not, is entered marginally as p ; and in the same way 
the other total probabilities q, p\ q' are entered margin- 
ally as sums of a row or of a column. 
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Contingency Table. The fourfold table is a simple 
example of a contingency table. The more general bivariate 
contingency table has h rows and k columns, corresponding 
to the division of one system into h categories and the 
other into h. If the probabihties are all rational 
fractions, it is possible to represent the bivariate population 
by a physical model, such as one of marked or coloured 
balls in due proportions. 

If E and F are independent events, then = pp\ 
Pio = M'. Poi = mp' and = qq', so that pi^pg^ = PioPoi- 
The determinant PuPoo~PioPoi of ff'-® fourfold table is 
thus zero. 

Ex. 1. Prove that tliis determinant is equal to p^—pp' 
and to PoQ—qq'^ 

G-enerating Functions- Just as in 7 we introduced 
a variable t to carry x as exponent in univariate generating 
functions, so it is natural to introduce u to carry y. The 
probability g.f. of a fourfold table will thus be 

G{t,u) =P:iJ.u+P:i^^t+p^jU+p^Q . . . (1) 

= l+j)(i-l)+^?'(M-l)+j5u(t-l)(u-l). . (2) 

Ex. 2. Show that in the case of independence this splits 
into the two factors pt-\-q, p'u-{-q'. 

Now if we draw n times, with replacement each time, 
from the population characterized by the fourfold table, 
the g.f. will be 

(Piii'Z^-l-Pioi+Poi'^+I^oo)”- • • • (3) 

The coefS-cient of t^u^ in the expansion of this g.f. will 
be the probability j>{x, y) of having x cases E and y cases F 
in the n drawings. The function ^{x, y) is the correlation 
function of binomial type. 

Ex. 3. If the variates x and y are independent, show that 
<j){x, y) is the simple product of the binomial probability 
functions 

and 
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Again, if we have a fourfold population of N individuals 
with Apii, Apioj ^Poi individuals in the respective 

categories, and if we sample n times without replacement, 
the corresponding probability of x cases E and y cases F 
is the correlation function of hypergeomefric type. If x 
and y are independent the function is the simple product 
of hypergeometric probability functions in x and y. 


46. Bivariate Moments and Moment Generating 
Functions. The bivariate product moment of order r 
in X and 5 in 2 / is defined by 


li'„ = SS4>{x,y)xry‘ or 



X, y)afy^dxdy, . 


( 1 ) 


or tte corresponding mean values with S\dy or I dxS, 

X j J y 

according to the discrete or continuous nature of the 
variables. 

There are three moments of the second order. If we 
take them with respect to the means and of the 
variates they are (jl^q the variance of x, the variance 
of y, and the product moment of x and y, often called 
the covariance. 

Generating Functions. The bivariate generating 
function of probability is defined by 


or 


G{t^ u) = ZS<j){x, y)t^uy or J y)t^uydxdy^ (2) 
the same with E^dy or ^dxS. 


Moment generating functions are defined by putting 
t=z e°', u= e^, the general product moment being the 
coefficient of a^j8®/rl si in the resulting m.g.f. 

Factorial moments can be defined by putting factorials 
x^^^ and as defined in 29 (2), instead of powers x^ 
and 2 /* ; and a bivariate f m.g.f. may be constructed by 
putting t = 1-1-a, u = l+j8. 
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Example. Prove that the f.m.g.f. of the fourfold table is 
(l+pa)(l +p'p)-\-iPi^oQ~PioPoihP> 


47. Normal Correlation as the Limit of Binoroial 
Correlation- The g.f. of a sample of n drawings, with 
replacement each time, from the fourfold population is 

by 45 (2). 

Just as at the corresponding stage in 31, let us consider 
the deviation of relative frequency of number of successes 
from means, rather than absolute frequency. We do this 
by putting t = u ^ eP in (1) and then writing ajn for 
a, ^1% for j8. We have then the bivariate m.g.f. 

[1 -\-pajn -\-pcL^/2n^ -\-p'^ln -{-p'^l2n^ 4-0(71“^)]” 

= [{l+paln+^p^a^ln^){l+p'^ln+ip'^P^In^^ 


which tends asymptotically as n increases (the assumption 
throughout being that none of the probabilities in the 
fourfold table is 0(n-'^)) to 

exp ipa+p'^) exp ^(ala^~\-2p(j^(T2a^+o^^) (3) 

where o-f = pqln, cl = p'q'jn, pc^a^ = {Pn^pp')!^- 


Next, just as in the case of one variate treated in 31, 
and for analogous reasons, the question is to find a 
continuous function y) satisfying 


00 

J J (f>{x, y)e°^'^^!^dxdy 


= exp |((Tfa®+2p(Tiff2a^+a|/3*^). (4) 


The answer provided by pure mathematics is that the 
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only function (^(x, y) for which this is the case over a finite 
domain in a and together is 


y) 

where Q[x, 


exp y)], 




~P1 


(5) 


This function, the analogy of which with the normal 
probability function in one variable is evident, is the 
bivariate normal probability function or normal correlation 
function. The parameter p is called the coefficient of 
correlation. The reader will verify at once that when 
p = 0 the correlation function breaks up, as one might 
have expected, into the product of two ordinary normal 
functions, in x and y respectively. 


48. Properties of the Normal Correlation Function. 
Let us suppose that units of scale in x and y are standardized 
by putting oi = 1, = 1. The m.g.f. of the normal 

correlation function about the means then becomes 

exp J(a2+2paj8+j82), . . . (1) 

and the coefficient of aj8/ll 1! in the expansion of this 
shows that p is the mean value of the product xy. This 
suggests that in computing the parameters of bivariate 
frequency distributions we should add to the usual four 
parameters of first and second order, namely the means 
variances and 5| of x and y, a fifth parameter, 
the mean value of the product of corresponding deviations 
X and y from the sample means. 

The standardized value of this mean product, namely, 

^ -^(*-«*io)(2/-«»oi)/«i«2> (2) 

corresponds in the sample to p in the population or 
probability function. We shall caU r the Pearsonian 
coefficient, or product-moment coefficient, of x and y in 
the frequency distribution. 
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Limits oi r and p. The extreme values that r and 
p can take are 1 and — L They cannot lie outside those 
limits. Tor, taking x and y as unstandardized deviations 
from their means, let us consider the mean value of 
{Tix+hy)'^, or -\-2}ihxy +h^y'^ , both in population and 
in sample, where h and h are arbitrary. In population 
the mean value is }fiG\+ 2 hhpa^a 2 +^^G% in sample it is 
hhl +2'hlcrs^S2 (3) 

Now these quadratic expressions in h and k, being the 
mean values of squared functions, are of necessity not 
negative. But the necessary condition for this is that the 
discriminants 

and (rs3^s2)^-sis| . . (4) 

should not be positive. Hence p^<l, r^^l, so that 
both p and r must lie in the range — I'to 1. 

The result, it may be noted, depends on a property of 
quadratic expressions, and therefore holds not merely for 
normal but for any distribution of x and y. 

Example. Prove that if x and y are uncorrelated and of 
unit variance, x cos 6 +2/ sin 6 and x sin 6 —y cos 6 are also un- 
correlated and of imit variance. 

In the case of independent variates, under any laws of 
distribution, the product moment p^ about the means is 
zero. Tor if the separate m.g.f.’s of x and y about their 
means are 

and (5) 

then by compound probability the m.g.f. of the two 
together is 

(l+/X2oaV2i+...){l+i^o2i8V21+...). • (6) 

and since this has no term in ajS we have = 0. 

It is most important to notice that the converse 
theorem is not true. The vanishing of does not imply 
iTidependence. Consider for example the case when x is 
distributed in any symmetrical distribution about the 
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mean a; = 0, with, variance or 2 . Then z = is also 

distributed about a zero mean. The variates x and z 
have complete functional dependence. Their product xz 
however is and the mean value of this is clearly 

zero. This is an extreme case, but it gives a sharp 
warning against inferring the existence of independence 
from a zero value of />, and still more from a zero value 
of r, which is merely an estimate of p. 

The normal correlation surface, when p == 0 and 
variances are standardized to unity, is a symmetrical 
bell-shaped surface which may be generated by the rotation 
of its central vertical section, a normal curve, about the 
vertical axis. When p ^0 the surface acquires a hog- 
back ridge which lies in the first and third quadrants of 
(z, 2 /) if p is positive, in the second and fourth quadrants 
if p is negative. 

The loci of equal probability density ( 7 ) are found 
by equating <l>{Xy y) to a constant, yielding curves of the 
form 

a^fal-2pxyla^a2+y^lal = c^. . . (7) 

These are homothetic ellipses. Among them the ellipse 
which includes a region in x and y of total probability J is 
sometimes called the ‘‘ probable ellipse,” a name, like 
“ probable error ” in 15, apt to mislead. This region is 
the bivariate analogue of the interquartile range (15). 

49. Regression Lines in Bivariate Normal Cor- 
relation. If we cut the normal correlation surface by a 
series of planes all perpendicular to the axis of a;, the sections 
are aU normal curves. For each such section corresponds 
to a constant value of ajj. of x, and so the z-ordinate of such 
a section is, in standard scale, 

Z = y) = c esp l-U^l-2px0+y^)l(l—p^)], . ( 1 ) 
= C:,exp[-i{y-px,)^l(l-p^)], . . (2) 

where c and are constants ; and this is the ordinate 
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of a normal curve with mean at 2 / = pxj^ and of variance 
1— or in unstandardized scale a|(l— p^) ; this variance 
is the same for all such sections. The locus of the means 
of such sections is therefore the straight line y = px, or 
in unstandardized units y/ug = This straight line 

is the regression line of y on x. There is correspondingly 
a regression line x == py, or xfor-^ = pyjcro, of x on y. 

The regression lines do not coincide unless p = ±1, 
in which case (with standard units) they are the bisectors 
of the angles between the x and y axes. If p == 0 the 
regression lines are the axes themselves ; but the concept 
of regression is of little importance in this case. 

Note. The name ** regression ” was introduced by Sir Francis 
Galton (J. Anihrop. Inst., 15 (1886), p. 246). In bivariate 
data concerning heights of fathers, %, and heights of eldest 
sons, 2//> he found that the regression lines, as estimated from 
the sample, were approximately y = ^x,x ^ ^y. This implies, 
for example, that if there is a group of fathers whose heights 
all deviate from mean height by d inches, then the average 
deviation of the height of their sons from mean height is only 
^d. There is thus a tendency, in the next generation, to 
return or regress towards the mean. If this feature of 
regression were not present, a character such as height 
might acquire greater and greater dispersion in succeeding 
generations. 

50. Correlation Table : Computation of Product- 
Moment. A contingency table of h rows and h columns 
in which both variables x and y are metrical is called a 
correlation table. If x and y are continuous variates it 
will be convenient to take a class-unit of suitable size 
for each and thus to have class-jfrequencies corresponding 
to class-rectangles. For practical purposes it is advisable 
to choose these units so that each variate has ten or a 
dozen classes, not more. 

The following example illustrates the usual appearance 
of a correlation table. (The distribution is of Binet 
Intelligence Quotient, x, and Verbal Score, y, of 500 Scottish 



90 PROBABILITY AND FREQUENCY IN TWO VARIATES 


schoolgiris born in 1921, tested in the first week of June 
1932. The hitelUgence of Scottish Children, Univ. of 
London Press, 1933, p. 96.) The score named 60 means 
60 and over, that is, the class 60 to 69, so that the class 
marked 60 in the report should be centred at 64-5 ; and 
so for other classes. The sums of rows and columns are 
entered in the margins ; they give the frequency distribu- 
tion of X when variation in y is neglected, and of y when 
variation in a; is neglected. 


X (Binet I.Q.) 



60 70 80 90 100 110 120 130 140 150 

fy 

70 

2 

2 

60 

3 2 6 3 4 1 

19 

y 50 

10 15 26 19 14 2 

86 

(Verbal 40 

2 7 32 43 23 7 2 0 1 

117 

Score) 30 

2 28 50 31 15 2 1 

129 

20 

10 32 38 6 1 

87 

10 

11 28 4 

43 

0 

3 7 7 

17 

fx 

3 32 102 134 98 67 34 22 6 2 

500 


From the marginal distributions we can proceed to 
compute the means and mean-square-deviation from 
means of x and y. This wiU always be the first step in 
computing r. The product-moment can be computed 
about provisional means, and then transferred by a 
correction to the true means, thus ; 

Since = So> = "^oi> 
the product-moment about these means is 

i . (1) 

Hence, just as mean-square-deviations can be com- 
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piited about a provisional mean and transferred (14) to 
the true mean by subtracting a square, so mean -product- 
deviations can be so transferred by subtracting the 
corresponding 'product of deviations of provisional means. 
It is to be observed that this product may be negative, 
in which case the correction involves an addition. 

Several different methods of computation are in use 
for finding r. We shall exemplify two, of which the rest 
are mostly variants. 

(i) The first method consists in computing SSzy 
piecemeal according to the contributions made to tiiis 
sum by the frequencies in the rows, or alternatively in 
the columns. Tor example, in the row, for y = 
constant, we compute S ffCj, that is, multiply each class- 
i 

frequency /,• by the value of x, Xj, and add along the 
row. For the different rows we may enter these values 
in a suitable column to the right. The sums of 
j 

such values for all rows gives iJx, and so may be used 
to check the mean ; while if we multiply each entry 
in that added column by its appropriate yj^ and sum down 
the column we have ZZxy. 

The same procedure may be carried out by columns 
instead of rows. We then have a check on both means 
and on ZZxy. The whole scheme can be neatly arranged 
in rows and columns annexed to the table as below. The 
special value of the arrangement is perceived when it is 
found necessary to compute correlation ratios (52) as well 
as correlation coefficients. It simpHfies the arithmetic, 
too, to choose units such that the class-breadths of x 
and y are both unity. 

Ex. 1. By way of explanation of the entries, note 
that the second entry, 63, in the Hx column comes from 
3Xl-f2x2-h6x3H-3x4-l-4x5 + lx6, while the second 
entry, —51, in the Sy row comes from 

2xl-h2x0-fl0x{-l)-bnx(-2)-f7x(-3). 
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\ -3 

' ' 

-2 

-1 

0 1 

2 

3 

4 

6 

6 

fy 

y 

vS 

y'S 

lx yXx 

4 







2 



2 

4 

8 

32 

8 32 

3 




3 

2 

6 

3 

4 

1 

19 

3 

57 

171 

63 189 

2 




10 15 

26 

19 

14 

• 2 


86 

2 

172 

344 

190 380 

y 1 


2 

7 

32 43 

23 

7 

2 

0 

1 

117 

1 

117 

117 

113 113 

0 


2 

28 

50 31 

15 

2 

1 



129 

0 

0 

0 

39 0 

-1 


10 

32 

38 6 

1 





87 

-1 

-87 

87 

— 44 44 

-2 


11 

23 

4 






43 

-2 

-86 

172 

-50 100 

~3 

3 

7 

7 







17 

-3 

-61 

153 

-30 90 

h 

3 

32 

102 134 98 

67 

34 

22 

6 

2 

1 500 


130 1076 

289 948 

X 

-3 

-2 

-1 

0 1 

2 

3 

4 

5 

6 






Xf 

-9 

-64 

-102 

0 98 134 102 

88 

SO 12 

289 





x'f 

27 

128 

102 

0 98 268 306 352 150 72 

1503 
















Wl 

j^Q = ztjyy&uu = 0*578. 

2y 

-9 

-51 

-102 

6 76 

SO 

63 

47 

16 

4 

130 

7n; 

- 130/500 = 0*260. 

& 

27 

102 

102 

0 76 160 189 188 

SO 24 

948 


=1503/500 











1 



— 

(0*578)» = 2*672. 


4 “ 1076/500-(0-260)* = 2*084. 

WjS, = 948/500 ~(0*578)(0*260) = 1*746. 

Hence 

= 1*635, 52 = 1*444, r= 1-746/1-635 X 1-444 = +0*74. 

Error of Sampling.* The coefficient r, computed in 
this way, has a probability distribution (Chapter VII) 
depending on the probability distribution of x and y and 
on the number n. If the distribution of x and y is normal 
the sampling distribution of r tends with increasing n to 
become a normal distribution with variance 
Consequently the standard deviation of the sampling 
distribution of r (“ standard error ” of r) is approximately 
but this is only so when n is large, let us 
say 7i>100, and when |/>| is not too high, let us say not 
greater than 0*5. In fact it is better in most cases, and 
certainly when n is small, to estimate by tables of R. A. 
Fisher’s distribution of r within what range r may be 
taken as an estimate of p. 

(ii) The second method of computing r depends on 
the simple observation that while by summing the 
frequencies in columns we obtain the distribution in x 
alone, and by rows that in y alone, if we sum along 
diagonals inclined at 45° to the horizontal we obtain a 

* This paragraph may be postponed until Chapter VII has 
been studied. 
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distribution of x~y ; for all class-rectangles in any such 
diagonal correspond to the same value of x—y. Thus 
from diagonal frequencies we may compute the mean of 
x—y, namely checking the individual 

means as computed from row and column marginal 
frequencies ; and we may also compute the mean-square- 
deviation of x—y from its mean. Now the value of this is 

= sf-2rsiS2+s|. 

But sf and are already known from the row and 
column marginal distributions ; hence r is easily found. 

Ex. 2. Taking the same example as before and summing 
along the diagonals, we find the frequency distribution of 
x^y to be 

0 1 2 3 4 5 N 

Nf 2 22 87 173 149 57 8 1 1 500 

The mean is found to be 0*318, checking the values 

= 0*578, = 0*260. The mean -square-deviation from 

the mean is 

sl-2rs^s^+sl = 1*265, 

whence 

r = ^(2*672+2*084--l*265)/l*635Xl*444 
= ix3*491/2*361 = 0*74, 

as before. 

Notice that we have here no check on r. That could be 
provided by summing along the other set of diagonals at 
right angles to those which have been taken. They correspond 
to constant values of x-{-y, and so their mean-square- de\dation 
from the mean is 

Ex. 3. The distribution of x-ry, obtained by summing 
along the other set of diagonals in the correlation table, is : 

x-\-y — 6 — 5 — 4 — 3 — 2 — 1 0 1 2 3 4 5 6 789 N 
Nf 3 7 18 38 38 68 63 64 68 40 37 23 20 6 6 1 500 

Compute r from this distribution. Notice how much more 
widely spread it is than that of x—y in Ex. 2. 



94 PROBABILITY AND FREQUENCY IN TWO VARIATES 


Sheppard's Corrections. Sheppard’s correction for 
variance in grouped data is applicable to the mean-sqn are- 
deviations of X and y, but not to the mean-product- 
deviation. On the whole, however, it is better to work 
without the corrections, because the tables of Fisher’s 
sampling distribution of r do not take account of grouping. 

61. Correlation of Variates with Poissonian Dis- 
tribution. It is not necessarily true that sampling from 
a fourfold population always produces as a limiting case 
a bivariate normal correlation function. Suppose, for 
example, that p and p' are of order Ijn, Then may 
be of order but may also be of order Ijn, 

The f.m.g.f. of a sample of n individuals with replace- 
ment is seen from 47 (1) to be 

( 1 ) 

When p = fjblrb and p' = [x'ln, and p-^ is 0(lln^), this 
g.f. tends to exp (fia+fi'p), which shows that with in- 
creasing n the probabihty function reduces to the product 
of independent Poissonian functions, and is in fact 

>P{x,y) ( 2 ) 

On the other hand, when p^ is 0(l/?i), we have 

= [(1 +i5a)(l +p'm+Pn-PP’oiHO{n-^W, 

which tends to 

exp . . . (3) 

where 

fi = n{p^-pp’) - nipjjPao-PiaPoi)- • • W 

Evidently fl is the ordinary product-moment about the 
means. 

Now putting a = 1, jS == ^*—1 in (3), we derive the 
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correlation function \fs{x, y) as the coefficient of in the 
probability g.f. 

It is found without difficulty to be 


y) = A) 

a:! yl 


x|l+ ^ + (6) 


where the polynomial in the bracket terminates after aj+l 
or ^+1 terms, whichever is the lesser. This function is 
the bivariate Poissonian function. It may be proved that 
the loci of means of sections corresponding to constant 
X QT y are straight lines, so that here again we have linear 
regression. The same property may be proved to hold 
for binomial and hypergeometric correlation functions. 

Both the normal function and the Poissonian correlation 
function can be derived, like the corresponding functions 
for one variate, on more general grounds than sampling 
from a fourfold table, by a* compounding of elementary 
increments achieved by addition of bivariate seminvariant 
g.f.’s ; but this derivation lies beyond our present scope. 


52. Non-Iiinear Correlation and Regression. A 

linear regression between correlated variates is rather 
exceptional. The loci of means of arrays usually deviate 
from straightness by more than can be ascribed to random 
sampling, suggesting that the underlying law of probabihty 
cannot be either normal or Poissonian. Non - linear 
regression curves are perhaps best estimated by fitting 
to the data suitable regression functions by the method 
of Least Squares, described in Chapter VI. In the 
non-linear case, too, the coefficient r or p has marked 
disadvantages (it was seen for example in 48 that p could 
be zero even when regression was perfect) and the cor- 
relation ratio 7 ]^ devised by K. Pearson, is much to be 
preferred. 
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It was proved in 49 that in normal regression all 
^-sections or arrays corresponding to constant x had the 
same variance, let us say 


4, = a|(l-p2), . . . (1) 

so that 

= • ■ • ( 2 ) 


Now 1— may be regarded as measuring something 
complementary or antithetic bo correlation. The word 
alienation is sometimes used to describe this quality, but 
alienation suggests repulsion and is too strong a term. 
Residual dispersion expresses the meaning better. In 
non-linear regression the variance of the ^/-sections, namely 


where 


4* = J y'>^y I j y)^y> ■ 
y)^y j • 


• ( 3 ) 

• ( 4 ) 


the mean of the ^/-section corresponding to constant x, 
is not usually constant. We may, however, take the mean 
of these variances of ^/-sections over all sections, that is, 
over all values of x, obtaining 


= J J y)dxdy, . (5) 

which may be regarded as the mean-square-deviation of 
y from its regression value taken over the whole 
distribution. Standardizing this by dividing by the total 
variance of y, namely o-|, and writing 

= ■ • • (6) 

we define a coefficient rjy^ analogous to p in (2). This 
coefficient is the correlation-ratio of y on x. The closer 
it approaches 1, the smaller is the residual dispersion and 
the closer the values y lie to their regressional means. 
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In the same way, by interchanging the r61es of x and y 
in the above derivation, we define rj^y, the correlation- 
ratio of y on X. As to the signs of r]y^ and there are 
cases where these can be attributed by graphical or other 
considerations, but there are also cases, for example when 
the curve of regression is a periodic curve with several 
oscillations, when sign has no meaning. 

The estimates of 'qy^ and as derived from an actual 
frequency distribution presented as a correlation table, 
will be denoted by and e^y. We define them 
analogously; thus 

= • • • ( 7 ) 

where ^ is the mean, over all ^-arrays (columns of the 
correlation table), of the mean-square-deviation of y jbrom 
the mean ya? of the column. In computing this mean of 
mean-square-deviations the column frequencies, marginally 
entered, serve as class frequencies. The effective arith- 
metical arrangement of the computation will be given later. 

That the correlation-ratio is actually a ratio, namely 
the ratio of the standard deviation of the means of arrays 
to the total standard deviation of the variate, will now 
be proved by considering 

Lemma. If Jc sets of ni, 9^2, ...» observations, 
with respective means and mean - square - deviations 
52 , / = 1, 2, h, are pooled in an aggregate of 

n = ni+^2+-*« T-^ft observations with mean M and mean- 
square-deviation s^, then 

ns^ = i:nj(A^c?X ( 8 ) 

j ^ ^ 

where Cj = M — M,-. 

This follows at once from the fact that the mean- 
square-deviation of the set about M is 

Applying this lemma to the column-arrays of a 
correlation table, we have 

= +Mf), ( 9 ) 

a 
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where and 5^. are the mean and mean-square-deviation 
of the column. (The origin is the mean of both x 
and y.) This is the same as 

4 = • • -( 10 ) 

the second term denoting the mean-square-deviation of 
column means, when these are associated with column 
frequencies. 

The above result holds for the sample. A similar 
result can be proved for in the population, integrals 
replacing sums, and variance replacing mean-square- 
deviation from means. The result may be put in words 
thus ; total mriance of y is equal to mean of variances 
cr|^ of y -arrays plus variance of means of arrays. It 
is another example of analysis of variance (26, 75). 

Hence, by (6) and (7), 

’Jv* = and = sys% . . (11) 

so that and e^^ are displayed as ratios of variances 
or of mean-square-deviations. 

63. Computation of Correlation-Ratios. The result 
62 (11) permits us to compute and e^y by a simple 
extension of the first method of 50 for computing r, for 
the means of rows and columns are given by the entries 
in the column headed Sx and the row headed Zy, divided 
respectively by the frequencies fy and f^. Also, the means 
of these entries are and Hence, computing mean- 
square-deviations from means in the usual way, we have 

= ( 1 ) 

and similarly for 6^. We thus annex two rows 
{Sy) ^jfiCi and two columns (Zx) {Zx) to the computation 
scheme for r. 
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Example, The additional rows and columns for the 
example of 50 (Binet I.Q. and Verbal Test Score) are as 
follows : 


81 2601 10401 36 5776 6400 3969 2209 256 16 i 
27 81 102 0 69 98 117 100 43 8 633 

"" [^33/500 -(0-260)*]/(l -444)* 

« M98/2-085 = 0-575 
= 1915/500 -{0-57S)*]/(l*635)» 

= 1-496/2-673 = 0-560. 


m 

{Sxpy 

64 

32 

3969 

209 

36100 

420 

12769 

109 

1521 

12 

1936 

22 

2500 

58 

900 

53 


915 


Hence and are equal to 0*76 and 0*75, whereas the 
value of r was found to be 0*74. 


64. GorrelationofNon-Metrical Characters, When 
the characters in a double classification are purely quali- 
tative, capable of being graded by a recognizable difference 
in category, but not susceptible of measurement by metrical 
scale, we must fall back on the contingency table ofhxh 
rectangular cells, with corresponding cell-frequencies. 
Since variances and product-moments are now out of the 
question, the presence or absence of correlation must be 
inferred from the cell-frequencies themselves, according 
to the manner in which they deviate from presumptive 
cell-frequencies in the corresponding case of independence. 
Consider, for example, the following contingency table 
due to Galton [Proc. Roy. 8oc,, 40 (1886), p. 42), illustrating 
the incidence of eye- colour in a group of fathers and eldest 
sons. 



E, 


E, 

E, 

p' 

-Fi 

0-194 

0-083 

0-025 

0-056 

0*358 

■Fa 

0-070 

0-124 

0-034 

0-036 

0-264 


0-041 

0-041 

0-055 

0-043 

0-180 

F, 

0-030 

0-036 

0-023 

0-109 

0-198 

P 

0-335 

0-284 

0-137 

0-244 i 

1-000 
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A 

A 

^3 

^4 

1 p' 

Pi 

0*120 

0*102 

0*049 

0*087 

0-368 

(ii) ^2 

0*089 

0*075 

0*036 

0*064 

0-264 

P, 

0*060 

0*051 

0*025 

0*044 

0-180 

Pi 

0*066 

0*056 

0*027 

0*049 

0-198 


0*335 

0*284 

0*137 

0*244 

1-000 


The colour categories are 1, blue; 2 , blue -green or grey; 
3, dark grey or hazel ; 4, brown, n = 1000. 

Summing down columns we obtain frequency estimates of 
the probabilities p of respective eye- colours for fathers 
irrespective of sons, and summing along rows, frequency 
estimates of probabilities p' for sons alone. These marginal 
frequencies or relative frequencies may be recombined 
again to form a multiplication table, which is to serve 
for comparison with the original table. The marginal 
frequencies in the second table are the same as in the first, 
but the cell-frequencies, derived as they are by appl 3 dng 
the law of compound probability, represent what would 
have been the state of affairs with the same marginal 
frequencies had there been independence. Of course it 
must be observed that if we use, as here, not the a priori 
marginal probabilities but only the sample estimates given 
by the marginal frequencies, this procedure is bound to 
affect the sampling probability of the coefficient or criterion 
of comparison, 

The coefficient x^ ^ quadratic function of the 
deviations of cell-frequencies in the actual from those 
in the presumptive mdependent case ; it is a kind of 
composite weighted variance, with application not merely 
to contingency tables but also to any comparison of actual 
frequency classifications, single or multiple, with pre- 
sumptive ones. It was first employed by Lexis, but the 
nature of its probability distribution was first obtained by 
K. Pearson in 1900. 
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Distribution of The derivation of the 

distribution involves the general multivariate normal 
correlation function, which is outside the scope of this 
short book ; but the outlines may be sketched. If the 
a priori probabilities in the h classes are p^, p^, pj^, 
then the frequencies in the classes are characterized by 
the multivariate multinomial g.f. 

• • • ( 1 ) 

If the number of individuals found in a class is 

the expected number being np^^ we may denote the class 
deviation from mean value or expectation n^—np^ by 
Then, since = n and also Snpj == n, we must have 

= 0 , . . . • ( 2 ) 

a relation in virtue of which only ^—1 of the deviations 
let us say the first are independent. We therefore 

put = 1 in the g.f. and consider what happens as n 
increases. Putting = e®/, we find that, provided no pj 
is 0{pr^), the multivariate m.g.f. of the class deviations 
tends to 

fc-i 

exp [in 2{p^p.^—2pip,<iiaiy] . . (3) 

i, i = 1 

This is an m.g.f. of normal correlated probabihty in 
the h—l deviations, which on reversion gives the probability 
differential of the as 

h 

ces:p[—in-^Se^lPj'\d€jde^...de^_-^.. . (4) 

i = 1 

Thus the probability, or probabihty density, of a set 
of deviations is a function of the quadratic expression 

X^ = S^lnp„ . . . (5) 

which is Pearson’s Having decided to use the com- 
posite x‘^ rather than the individual deviations as a 
criterion of the nearness to expectation, we transform the 
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differential (4) into a differential in itself, when it 
assumes the shape 

d'p — • • • ( 6 ) 

the probability function being of Pearson’s Tyj^e III or 
Gamma type. 

The probabihty of obtaining a value of ^ not exceeding 
a given is therefore 

j . (7) 

Tables of this function P have been computed for various 
values of h, the number of classes, and x^- 

Degrees of Freedom in When the class pro- 
babilities Pij are given d priori the distribution of for 
h classes is expressed, as we have seen, by 

dp = C)^-^e-i^’‘dx’‘. . . . (1> 

But the presumptive class probabilities axe not always 
giveh d priori ; in a contingency table, for example, they 
may be estimated by recombining in multiplication the 
marginal relative frequencies of the table which is being 
tested. Now such a procedure forces the marginal totals 
of the presumptive table of independence to agree with 
those of the contingency table. This forcing reduces the 
number of independent class deviations from expectation. 
For example, in a 4-by-6 table there are 24 classes, of 
which 23 have independent frequencies, since the total of 
relative frequencies must be 1. This is in the absence 
of forcing. On the other hand, if the 10 marginal totals 
are preassigned, then there are only 3x5 or 15 inde- 
pendent class frequencies, as may be seen by putting these 
15 in the top left part of the table, so as to fill 3 rows 
and 5 columns, and observing that all the others can then 
be filled in by reference to the marginal frequencies. 
In general, in an hxj table with forced marginal 
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agreement, there are only (^— 1) independent class 
frequencies. 

Now, in preparing for comparison by the x^-test in such 
a case, we should not integrate c exp 
over all the previously independent e^*, for by so doing we 
would be unfairly including combinations of the e,- which 
have been precluded by the procedure of forced agreement. 
We ought to transform ^^^.t it is expressed in terms 
of the restricted set of independent e,-. It was shown by 
El. A. Fisher that when this is done the modified element 
of probability is simply 

dp = . . (2) 

where m is the number of restrictive relations, reducing 
the number of independent e,- from A;— 1 to k—m—1. It 
is usual to call k—m—1 the number of degrees of freedom. 

The table of P{x^) is therefore best constructed, and 
consulted, with reference not to Jc, the number of classes, 
but to k—m—1, the number of degrees of freedom ; and 
this applies not only to contingency tables but to all 
situations in which a presumptive probabihty distribution 
is obtained from a frequency distribution by a partial 
forcing of agreement, the equating of moments for example, 
involving restrictions on the deviations €y. These restric- 
tions must be linear, that is to say, they must involve the 
in the 1st degree only. 

Since in the deduction of P{x^) we excluded the case 
of very small class probabilities, we must exclude in 
practice small class frequencies. It is customary, there- 
fore, in applying the test, to pool the small frequencies 
at the ends of a distribution so as to make the classes 
contain at least 10 individuals. 

Example. The fitting of Poissonian and Type B functions 
to the Rutherford-Geiger data in 42. We pool the classes 
corresponding to cc == 10 and over. Thus & = 11. 

For the Poissonian fitting there are 9 degrees of freedom, 
since the total frequency and the mean have been made to 
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agree in fitted curve and data. We find = 12*8, and 
reference to tables shows that P = 0* 20, a satisfactory value. 

For the Type B there are 8 degrees of freedom, total 
frequency, mean and second factorial moment having been 
made to agree in fitted curve and data. We find = 10* 2, 
P = 0*25. The slight improvement is of little consequence ; 
in both cases the principal contribution to x^ comes from 
the large deviation in class a? = 8. 

Empirical Formula. The value of which 

P == 0*05 is often regarded as a boundary between the 
reasonable and the dubious. This value of is given 
with adequate approximation, for degrees of freedom, by 
1*55(F+2), F<10, and l*25(A;'+5), 

For ib' = 35 the second formula ab3ve gives the value 50, 
the actual value of being 49* 79. For higher values of k', 
\/2x^—V 1 may be treated as a standard normal variate. 

55. Coefficients of Contingency. The possibility of 
dependence between variates in a contingency table can 
be tested by I’or Galton’s data of eye- colours in 54 

the value of is 266, a value so large that the probability 
of independence of eye- colour between fathers and eldest 
sons is negligibly small. 

Attempts have been made to measure the strength of a 
dependence by means of coefficients of contingency. Thus 

measures, as it were, the dispersion of a grouped sample 
from expectation, taken over all n individuals ; and so 
the mean dispersion per individual is ^ coefficient 

denoted hj (f>^ and called by K. Pearson the mean square 
contingency. Since 

^2 = = 2{ejln) ( 1 ) 

it appears that is the sum of squared deviations of class 
relative frequencies €^1% from the presumptive class 
probabilities each divided by that probability 

Pearson, considering the value of for a bivariate 
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normal correlated distribution divided into grades of 
indefinite fineness in x and y, found the relation 

p^:=<f,^l(l+r), . (2) 

and, proceeding by analogy, defined a general coefficient of 
mean square contingency G by 

= . . . (3) 

Evidently G^ is zero when is zero and tends to 1 as 
increases ; but its interpretation for intermediate values is 
not very definite. 

Example. The computation of <j)^ and 0* for Galton’s 
data in 54. 

The table of values (Pn-'PiP'^)^ Ipip'^ is : 



E, 

Pa 

Pa 

E, 



0*046 

0*004 

0*012 

0*011 

0*073 

F, 

0*004 

0*032 

0*000 

0*012 

0*048 

Fz 

0*006 

0*002 

0*036 

0*000 

0*044 

F, 

0*020 

0*007 

0*001 

0*073 

0*101 


0*076 

0*045 

0*049 

0*096 1 

1 0*266 


Thus ^2 = 0 * 266 , 02 = 0 - 266 / 1-266 = 0 * 210 , C = 0 * 46 . 

Table of P(x^)* A table of P(x^)> arranged in a 
compact and practical form, is given in Table III of 
R. A. Fisher’s Statistical Methods for Research WorlcerSy 
8th edition, pp. 110-111 ; also in the Statistical Tables for 
Biological, Agricultural and Medical Research of Eisher and 
Yates (Oliver and Eoyd, 1938), p. 27. 

For practice in the x®*test, the reader may examine 
whether the experimental data of the examples on pp. 49 
and 50 are in good accord with the theoretical distributions, 
rectangular and binomial, there given. 
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THE METHOD OF LEAST SQUARES : MULTIVARIATE 
CORRELATION : POLYNOMIAL AND HARMONIC 
REGRESSION 

66. Multivariate Regression. Wien distributions in 
more than ttvo correlated variates are encountered, an 
important question is the determination of the optimal 
value (sometimes in the sense of mean value, sometimes 
in the sense of most probable value) of a particular variate 
in terms of the values of all or any given set of the other 
variates. We have seen that in normal bivariate dis- 
tributions the loci of such optimal values are straight 
regression lines. In normal correlation of many variates 
the corresponding loci are stiU linear, expressed by 
equations of the first degree. For three variates there 
are three planes of regression, for n variates there is a 
sheaf of n hyperplanes, each given by a linear equation 
expressing a particular variate in terms of the other n—\ 
variates. 

It was proved by Yule that these various linear loci 
could be obtained without the assumption of normal 
distribution by using the method of Least Squares, which 
we now describe. 

57. The Method of Least Squares. The method of 
Least Squares originated in the practical necessity of 
combining discrepant observations of a single unknown 
constant, or discrepant observational equations m several 
unknowns, in such a way as to obtain best estimates of 
the unknown or unknowns, under some accepted criterion. 

106 
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Discrepant measures are inevitable in repeated obser- 
vations, even when every efiFort has been made to keep 
conditions constant. The conditions can never be identi- 
cally realized a second time. However delicate the 
instrument of measurement, there are innumerable fine 
and uncontrollable variations inherent in its parts and 
their adjustment and the readings, to say nothing of the 
inaccuracies of the observer. Hence, just as in the 
throwings (4) of a coin, we have varying phases of a 
system S, Thus repeated measures of a supposedly 
unique physical constant are found to be discordant, 
the truth being that they are a sample from a certain 
probability distribution depending on S, In the same 
way, when linear combinations or other functions of 
several unknowns are measured, the number of observations 
exceeding the number of unknowns, the equations so 
derived are nearly always found to be inconsistent. 

In 1805 Legendre proposed, as a convenient method 
for reducing certain astronomical observations, that the 
‘‘ best value ” should be taken as that for which the sum 
of squared deviations of the observations was least. This 
is the principle of Least Squares, It can be justified under 
the assumptions (i) that the measures are normally 
distributed and (ii) that the best value has maximum 
probability density. This derivation is mathematically the 
simplest and most rapid, but it unduly limits the types 
of error distribution. A more comprehensive derivation 
postulates that the best value is (i) a consistent or unbiassed 
linear combination of the observations and (ii) has minimum 
variance. It is remarkable that the two quite different 
sets of postulates lead to exactly the same equations for 
the unknown or unknowns. 

58. Precision, Weight, Errors and Residuals. 

Measuring instruments of differing precision may be 
characterized by their standard error, or variance of error, 
in the reading given by them of some assigned measure. 
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The variance may be estimated by repeated trials. It is 
traditional here to use the term weight, defined as propor- 
tional to the reciprocal of variance of error. Por example, 
if in determining a distance of 5000 yards the standard 
error of a range-finder A is estimated to be half that of 
a range-finder B, the weights Wj^ and assigned to 
readings made by A and B would be as 4 to 1, in favour of A . 

Finally, it must always be kept in mind that '' true ’’ 
values (if indeed the word true ” admits at all of definite 
meaning) are unknown and must remain unknown ; so 
that the errors, being deviations from an unknown value, 
are likewise unknown. True values must be estimated by 
appropriate substitutes, namely, best or optimal values, 
and errors by the deviations of the observed from the 
optimal values. These deviations are distinguished from 
the errors which they represent by being called residuals. 
Errors are a, residuals are where a 

is the true value, an observed value of a, and d the 
optimal value of a. If there are n observations, the n 
residuals are estimates of the n errors ; and the n errors 
are themselves only a finite selection under the law of 
probability, which characterizes the circumstances of 
measurement. 

69. Repeated Measurements of a Single Unlmown. 

The estimate by Least Squares is found by minimizing 
the sum of weighted squares of residuals. The minimum of 

8^ ■= Bwj^Xj—x) 2 ( 1 ) 

is given by = 0, so that 

& = SwjXyjEwj. ( 2 ) 

The optimal value of x thus appears as a weighted mean 
of the observations. If the observations are all of 
equal weight the optimal value is thus the arithmetic 
mean. 
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Variance of Optimal Value- The variance of » in 
(2) is (15) 

. (3) 

where a| is the variance of and is the variance of 
an observation of unit weight. Thus the weight of x is 
the sum of the weights of the In particular 
the weight of the arithmetic mean of n values is n 
times the weight of any x^. 

Variance of Residuals in Case of Equal Weight. 
If the observations are all of unit weight the residual 
is 

x^--x== {n-l)x^ln^{xj^+x^+.,.+x^—x^)ln. . (4) 
Thus the variance of e,- is (15) 

{n — l)^a^ln^-{-(n — l)<7^ln^ = {n—l)a‘^/n. . (5) 

It follows that an estimate of cr^ is given by dividing 
the sum of squared residuals not by n but by %—L 

Ex. 1. The author made 30 bisections by eye of lines of 
constant length. The distribution of x, the length in cm. of 
the segment to the left of the point of bisection, was : 
a; 7-6 7*65 7-75 7-8 7*85 7-9 8*0 8-1 8-15 8*2 8-25 8*45 n 
n/ 23 1 44242322 1 30 

Estimate the length of the half line and the standard error. 

Ex. 2. Do the same for the results given by a second 
person : 

X 7*7 7-75 7-8 7-85 7*9 7*95 8-0 8*05 8-1 8-15 8-2 8-3 n 
nf 1 I 14354531 11 30 

Ex. 3. Compare the precision of the two persons by 
assigning weights. By a weighted combination estimate the 
length of half the line from all 60 bisections, and assign a 
standard error. (The length of the line was actually 16 cm.) 

60. Indirect Determinations from Linear Equa- 
tions. In this case we have measurements of n Imear 
functions of m unknowns, where n exceeds m. Because 
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of observational error the equations are inconsistent. For 
example, we might have 

Observations. Weights, 

a; = 1*75 2 

x-i-y-^-z = 3-85 4 

2/-f2;+tfc = 4*30 2 

z+u = 3*05 3 

u = 2-10 1 . . (1) 

In such a case the method of Least Squares consists 
again in taking as optimal values those for which the 
sum of weighted squares of residuals is a minimum, so that 
for example to solve the equations (1) we would minimize 
= 2(aj~l*75)2+(a;4-2/-~3*10)2+4(£c-h2/-f2:'--3*85)2 (2) 
4.2(2^+e+i^-4-30)H3(z+t^-3-05)24-('M-2*10)2 

with respect to x, y, z, u. More generally, if the equations 
are (to take the case of 4 unknowns) 

a^x +biy-]-CiZ+dyU = weight Wi, 

+ 6 j 2/ + CaZ + (iaM = ^2, ... w^, 

we minimize 

i=i 

and similarly for any number of unknowns. 

The partial derivatives d8^jdx, dS^jdy, dS^jdz, dS^jdu 
must be zero ; and so we derive the equations 

{Swja^)x +{Swfi,fij)y-^{Swf,fj)z+{Zwp^j)u=Zwfb^lif, 
[Ewfipi)x-\-[Swjbj'^)y +{SwfijCj)z-\’{Swjb,4j)u=Zwjhjh^, 
[Ewfifii)x + {Zwjhfj)y + (Zwj-opz + {ZwjCjd^)u=Zwfjh^, 
{Zw/i,dj)x + {2wfi^dj)y + {Swfjd^)z + {Zwjd^)u —Zwjdjh^, 

for Xj y, z, u. These are called the normal equations, and 
their general form is similar to the above. Inspection will 
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show that the coefficients in the normal equations are 
symmetrical, in the respect that the coefficient of the 
unknown in the equation is identical with that of the 

unknown in the equation. The scheme of coefficients 
is in fact symmetrical about its northwest to southeast 
diagonal. This symmetry is of great service in shortening 
the solution of the equations. 

Thus in our numerical example the normal equations will 
be found to be 

7a;+5y+42 = 22*000, 

5a:+7y-\-6z^2u = 27*100, * . (6) 

ix-j-6y-i-9z-\-5u = 33*150, 

2y-\-5z-{-6u = 19*850, 

which can now be solved by methods of practical algebra. 
The solutions are x = 1*750, 2 / = 1*274, z = 0*846, u = 2*178. 

Various schemes of systematic solution of normal 
equations have been devised, and for these the reader 
must be referred to more comprehensive treatises and 
origuial memoirs dealing with Least Squares or with the 
numerical solution of algebraic equations. 

Preparation of Normal Equations. It is evident 
from the construction of the sum of weighted and 
squared residuals that exactly the same sum would arise 
if we multiplied each observation throughout by the 
square root of its weight, and then regarded the 

observational equations as of equal unit weight. (Let the 
reader verify this from the example.) Such a reduction 
of a set of equations with unequal weights to a set with 
equal weights is called preparing the equations. 

61. Application of Least Squares to Trivariate 
Correlation. Suppose that we have n trivariate obser- 
vations 2 /j-, Sj-), as for example the height, weight and 
chest measurement of each of 1000 soldiers, and that we 
wish to express each variate as the best possible linear 
estimate of the other two. We may suppose the variates 
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measured as deviations from their respective means, and 
standardized. Thus for x we have % equations 

— ^122/3 0* “ (1) 

These may be regarded as n observational equations in 
the two unknowns ^is- If solve them by least 

squares we shall have the desired optimal relation 

^ "= ^122/+6i3^. (2) 

which may be regarded geometrically as the regression 
plane of a; on 2/ and z. The coefficients and 63^3 are 
called regression coefficients ; they are the sample estimates 
of ideal regression coefficients ^812, jSis in the underlying 
population. The normal equations for 612 and ^3^3 are 
obtained by minimizing the sum of squared residuals 

8 ^ = ( 3 ) 

The minimum conditions dS^jdh-^^ = 0, dS^jdhi^ == 0 
give, on division by n, 

^ 12 +^ 23^13 — ^ 12 > ^ 

^23^12"1"^13 ~ ^13? 

where r^^ = Sx^y^jn, = Zx^z^jn, r^^ = Zy^z^jn, 
Solving, we :l^d the desired regression coefficients as 

^12 = (^12'“^13^23)/(I~^23)» /gv 

and similar results hold for the regression of y on a; and z, 
and of z on cc and y. 

The standardized mean-product-deviations ^13 and 
rgg are usually called total correlation coefficients of x and 
y, X and z and y and z respectively. They are really 
estimates from sample of the corresponding mean-product- 
deviations, or product-moments and in the 

trivariate population or probability function. 
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It may be proved that the tri variate normal m.g.f., 
in standard scale and with means as origin, is 

exp(a2+^2^y2^2pija^+2/5i3ay+2p23^y) . (6) 

and by reversion that the corresponding trivariate normal 
function is 

4 >{x, y, z) 

= ( 2 , 7 )-«A-^exp 

^iPli Plip23)^y ^(plS PliPzs)^^ 2(P23 — Pl2pl3)y^}^ (') 

where A is the determinant 


A = 


^ Pl2 Pl3 

Pl2 1 P23 

Pl3 1 


of total correlations. 

The equations d(f>ldx = 0 , d^jdy = 0 , d<j>!dz = 0 give 
the loci of maximum probability of x for fixed y and z, 
of y for fixed x and z, and of z for fixed x and y. By 
actual differentiation we find these loci to be 


■ ■ • ( 8 ) 

and two others, where 

A2 = (pl2“Pl3p23)/(l~P23)» /g\ 

^13= iPlZ~Pl2p2z)K^~Piz)' 


Thus we see that the estimates of regression by Least 
Squares are in agreement with those based 'on normal 
trivariate correlation. A corresponding result is true for 
linear regression in any number of variates. 


62. Partial Correlation. The unstandardized equa- 
tions, with means as origin, of the regression lines in 
bivariate regression ( 49 ) are 

^ = I^i2y> where ^12 = P^i!^2^ 
y == where ^21 = • - ( 1 ) 

The correlation coefficient p appears here as the 
geometric mean {^12^ 21)^. On the analogy of this, partial 
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correlation coefficients in multivariate problems have been 
defined as the geometric means of the corresponding 
regression coefficients. For example, the partial coefficient 
of two variables Xj^ and Xj^ would be defined by {^nkPich)^- 

Notation. It is customary to denote, for example in 
a four-variate problem, the partial correlation coefficient 
of X and y by 34 , to distinguish it from the total 
correlation coefficient pi 2 - The sample estimate would be 
written 34 . 

Example. Given the following estimates of variances and 
total correlations of three variables x, y, z, find the three 
regression equations and the three estimates of partial 
correlation coefficients: 

<,2 = 5-0, (t| = 7-0, a|=3-0, r^^^0‘S0, = 0-40, 

r23 = 0-60. 

63. Non-Linear Regression : Polynomial Regres- 
sion. From the nature of a set of observations of a 
variate y dependent on x it may be apparent that the 
regression cannot be linear. Common types of non-linear 
regression are those in which the underlying functional 
relation of y and x is of polynomial, or of harmonic type. 

The polynomial regression 

y = 0Q+(^x+c^^-\-,„+CT,x^ . . ( 1 ) 

wiU be considered first in its simplest case, the fitting 
of the polynomial by Least Squares to n independent 
observations Ug. of equal weight, corresponding to n 
equispaced values of x, namely a; = 0 , 1 . 

The polynomial of best fit is given by the minimum 
of the sum of squared residuals 

~ E{u—Cq—c^x—o^x^—.,,—Cjp:'^)^, . (2) 

that is, by the conditions = 0. These give lc+1 

normal equations for the c^-, easily seen to be expressible as 

Ex^{v.^-yz) = 0, (j == 0, 1, 2, ..., h), . (3) 
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displaying the fact that the fitting of a polynomial of degree 
k by Least Squares is equivalent to equating the moments 
of orders 0, 1, 2, Z; of the polynomial and the data. 

The values of the coefficients can be found by solving 
the normal equations, but since sums of powers of natural 
numbers up to the are required, the method becomes 
laborious if n is large and if the polynomial y is of the 
3rd or higher degree. For this reason it is better to 
express y not in powers of x, but in polynomials 
1, having the property of being 

uncorrelatedj being in fact such that the product sum 

i:ti(x)tjix} = 0 it i ^ j. . . . (4) 

X 

These polynomials t^{x) are familiar in mathematics as 
the orthogonal polynomials of Tchebychef, and their 
properties are_ known. For example, it is known that 

+(2j—2)<j)(w—^‘+1)(2)%-2))— ...+(— (5) 
so that (Appendix, 1) the difference 
/SPt^{x) = {2j) {2j 

+ ... + {— +«s) ( {71 —S—l) (j-s) • (6) 

It is also known that 

2:(t,(x))^ = ( 7 ) 

X 

If, therefore, we express y in the form 

2/ = • ■ (^) 

the sum of squared residuals, because of the vanishing 
of the product terms, takes the form 

— 2J \ux Xq ••• 




( 9 ) 
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and the normal equations dS^fda^ = 0 therefore take the 
form 

ttj = Zuaitj{x)l2{t^{x))^, ij == 0, 1, k), . (10) 

Thus each coefficient in the regression is found independ- 
ently of the others, without the labour of solving 
simultaneous equations. (The choice of uncorrelated or 
orthogonal functions for the representation of always 
confers this very great advantage.) Since the pol37nomials 
tj{x) are expressed in factorials x^^), the numerator of the 
expression for can easily be found in terms of the 
factorial moments of the data Ux, these moments being 
obtained as usual (Appendix, 2) by summation. 

The minimum sum of squared residuals can itself be 
evaluated beforehand, for by (9) and (10) it takes the form 

= ( 11 ) 

mvolving the sum of the squares of the Uxi diminished 
by the product of each successive aj by the numerator 
in (10). It is known that the variance of a single residual 
is best estimated by dividing the sum of the n squared 
residuals by the degrees of freedom, n—Tc — 1 ; hence we 
can judge beforehand, if we know the precision of the data, 
what value of k gives the best polynomial y. It is of 
course possible, by taking too many terms in the polynomial 
y, to fit the data too well, in the sense that the sum of 
squared residuals is much smaller than that warranted by 
the precision of the data. 

64. Practical Routine of Fitting a Polynonaial. 
All of the above points, which can be treated only briefly 
here, have been discussed at length in special memoirs. 
We shall merely illustrate a method depending on the 
theory of 63 and making use of a table containing the 
terminal values and differences ^ 3 -( 0 ), A^^^CO), ..., 

for j = Of 1, 2, 3, k and the particular value of n. 
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Tlie rule for constructing sucli a table follows from (5) 
and (6) and is simple. We shall illustrate it for n — 6. 
We write down the fixed table of binomial coeJficients, 
table (i) below, to columns ; in the illustration, /: = 3. 
Beside table (i) we place table (ii), consisting of binomial 
coefficients of n—l, n—2, ... written below each other as 
shown, also to ^+1 columns. The products of corre- 
sponding entries in the two tables now give us the desired 
table (iii) of terminal values and differences of ^polynomials, 
and at the feet of the respective columns we enter the 
values of Zip as computed from the formula 63 (7). 


1-1 1-1 
2-3 4 

6 -10 
20 

(ii) 

1 5 10 10 

1 4 6 

1 3 

1 

1 -6 10 -10 

(iv) 

1 _5 5-5 

2 -12 24 


2 -6 12 

6 -30 


3 -15 

20 


10 


6 70 336 720 6 70 84 ISO 


A possibility making table (iii) still simpler for practical 
use is that when a common integer factor is observed in 
any column, we may cancel through by that factor, 
provided that the square of that factor is cancelled through 
from Stj. Thus the cancelling of factors 2, 2 from columns 
3, 4 in table (iii) above gives table (iv). Such tables, 
extended to six or seven columns, are easily constructed 
for a proposed value of n. 

The use of the table in finding the regression coefficients 
Uj* and the fitted values yg. is best illustrated by an actual 
worked example. The process is no more difficult for a 
long series of data than for a short, but to economize in 
space we shall illustrate it by fitting a cubic pol^momial to 
six values w®. 
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Example* 


0 1 2 3 4 5 

5 13 25 60 105 200 


By summation the reduced factorial moments of u are 
found to be 408, 1663, 2835 and 2480, while Eu^ = 55444. 

Using four columns only of the table of polynomials (since 
we are fitting a cubic) we set out the rest of the work in 
compact shape thus : 


408 1286 567 191 


a,. 68 18-371 6-75 1-0611 


Sums 

1-5 5 _5 

408 

2 —6 12 

1663 

3 -15 

2835 

10 

2480 



6 70 84 180 


Ayo A^^/o A®2 /o 

4-590 

8-975 

4-334 

10-611 

Check 2/5 = 198-91. 


Explanation, 

^0 = (408xl)/6 = 68. 

Oi = (1663 X 2-408 X 5) /70 = 1286/70 == 18-371, 

^2 = (2835X3 — 1663x6+408 X 5) /84 = 567/84 = 6-75, 

and so on ; the elements in columns of the table are used as 
multipliers of the factorial moments, the entries at the feet of 
the columns as divisors. Then 


2/o = 68x1-18-371 X5+6-75X5-1-0611X5 4-590. 

A2/0 = 18-371x2-6-75x6 + 1-0611 x12 = 8-975. 

AVo = 6-75x3-1-0611 x15 = 4-334, 

and so on ; the elements in rows are now used as multipliers 
of the and give the terminal value 2/0 and its differences. 
There is also a good check on the other terminal value, 

2/5 == 68x1+18-371x5+6-75x5+1-0611 x5 = 198-91, 

the saine terms as gave y^, but with positive multipliers. 

Building up^ a difference table of the from the constant 
3rd differences in the way familiar in interpolation, we have — 
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X 

y 

Ay 

A^y 

A^y 

u 

u—y 

0 

4-590 

8-975 



5 

0-410 

1 

13-565 

13-309 

4-334 

10-611 

13 

-0-565 

2 

26-874 

28-254. 

14-945 

10-611 

25 

-1*874 

3 

55-128 

53-810 

25-556 

10-611 

60 

4-872 

4 

108-938 

89-977 

36-167 


105 

-3-938 

5 

198-915 



200 

1-085 


The comparison of the fitted values with the data can he 
seen in the columns headed y and u. The sum of squared 
residuals [u—yY will be found to be 44*4. 

But we can also set out a table thus, estimating by 63 (11) 
the variance of a residual after a constant, a straight line, 
a parabola and our cubic are fitted in succession : 

h n— A;— 1 num. of aj. prod. --(w— A;— 1) 


0 

5 

68 

408 

27744 

55444 

27700 

5540 

1 

4 

18-371 

1286 

23625 

4075 

1019 

2 

3 

6-75 

567 

3827 

248 

83 

3 

2 

1-0611 

191 

203 

45 

22 


The column headed 8'^ shows the sum of squared residuals, 
obtained in accordance with 63 (11) by subtracting the 
entries in the previous column in turn from — 55444. 
The last column gives estimates of the variance of a single 
residual at the different stages. To test which polynomial 
best represents the data, we must have a preliminary 
knowledge or estimate of the variance of the observations. 
This variance is compared with the residual variance in 
ohe light of the sampling distributions of 71 and 74. 

The alternative computation of the sum of squared residuals 
as 45 checks the work, for the same sum was given by the 
fitted values as 44-4. 

For a given value of n the same table of terminal values 
and differences of ^-polynomials serves for fitting a poly- 
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nomial of any degree. Thus, using only the first three 
columns of the table in the above worked example, we 
may fit a parabola instead of a cubic. It will be found 
instructive to do this, following the details of the worked 
example. Notice that the coefficients aQ, are the 

same as before. 


Example. Fit a cubic polynomial to the seven equidistant 
and equally weighted data 

a; 012345 6 

u -11 5 13 25 60 105 200 

65. Periodic Regressions : Observations of Equal 
Weight. Observations which exhibit periodicity more or 
less masked by accidental error are of common occurrence. 
The height of tide -water at a seaport, measured at equal 
intervals of time, shows such a periodicity ; monthly 
averages of temperature show a seasonal periodicity ; 
telephone calls on an Exchange show a weekly periodicity. 

The procedure for analysing periodicity is to assume a 
periodic function 

__ -{-Oj^Gosd -\-a200s2d +... +ajcOOBJc 9 
'+61 sin 0 +62 sin 20 +... 4 -^>fcSinZ ;0 

and to find the coefficients and bj of the constituent 
periodic terms by the method of Least Squares. 

We consider therefore n equally spaced observations 
Uq of equal weight, where 9 = 0, 27rln, ^rrln, ..., 2{n—l)Trln, 
the observations thus corresponding to the n phase-angles 
of one complete oscillation of a periodic phenomenon. 
The initial observation of a second oscillation is not 
included. In view of the trigonometrical relations 

2rJiTr 2rjTT == 0, if A 
S cos cos ‘f-L • /A /I 

and the similar ones with one or both cosines replaced by 
sines (these are really orthogonal relations exactly 
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resembling those of the Tchebychef polynomials in 63 (4) 
and (7)), the sum 8^ of squared residuals is (c/. 63 (9)) 

E{Ug—y^)^ = 2'M^(a!o+ajCos d+...-\-hj^ sin W)} 

+|n(2ag+a2+ _ 

Differentiating with, respect to the and and 
equating to zero, we have the normal equations for the 
regression coefficients. Each is given independently of 
the others. 

12 2 
ao = * (^) 

If n is even, 

aj„ = -Zug cos ^nd, = 0, (5) 

n Q 

and cos^Tifl is +1 and —1 alternately as Q takes its n 
values. 

The theoretical solution is thus immediate. Simplicity 
of practical application wiU depend on the value of and 
the consequent values of cos W and sin W, 

66. Practical Solution of the Normal Equations. 

The process of numerical solution becomes specially simple 
when 6, that is, 27r/?i, is such that cos M and sin W are 
easy to handle. This occurs when tz == 4, 6, 8, 12 or 24, 
the last two cases being specially important, as corre- 
sponding to the hourly or two-hourly subdivision of the 
day ; and special routines for these values of n have been 
devised. 

The procedure depends on the fact that in the four 
quadrants, from 0 = 0 to ^ = 27r, cos B and sin B take the 
same absolute values four times, though with differing 
alternations of sign. To take the case n = 12 for illus- 
tration, the data (and there will be no misunderstanding 
if these are written meanwhile as Uq, %, ..., can be 
assembled in tetrads, for example zqi, before 
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being multiplied by the suitable values of coshd and 
sin hd, where h9 can always be taken as coterminous with 
some angle in the first quadrant. 

We shall indicate how this is done by an actual example. 

Example. To fit terms as far as a 4 COS 4:6, 64 sm 4:9 to the 
12 data (Whittaker and Robinson, Calculus of Observations, 
p. 272 ) : 

Uq Ui U2 Ug Wg U,J Wg Mg Mio U-^\ 

2-71 3-04 2*13 1*27 0-79 0-50 0*37 0-54 0*19 - 0*35 - 0-44 0-77 

First write the data in a scheme U of columns, down, up, 
down, up, with blanks as indicated by the dots, as follows : 


"271 

37 

. 


■+- 

+ 


+ ' 

304 

50 

54 

77 

M 

— 

— 

+ 

213 

79 

19 

-44 

4- 

— 

+ 

— 

127 

. 

-35 


+ 


— 



Next, add along the rows of the scheme Z7, after giving 
sign to the columns of U in four different ways, according 
to the rows in the sign-scheme If. We thus obtain four 
separate sets of totals, and these are combined with cosines 
and sines of 0°, 30°, 60°, 90° in four separate schemes, as 
below. (We have included the coefficients necessary for 


computing a^, 

&5 and ffg as well.) 







Cl>2 

Oe 


<h 

Uq 


308 

0*5 

1 1 

0-5 

234 

1 

1 

1 

485 

0-5 

0-5 -0-5 

-0-5 

277 

0-866 

0 

- 0-866 

267 

0*5 - 

■0-5 -0-5 

0-5 

71 

0-5 

-1 

0-5 

92 

0*5 - 

-1 1 

-0-5 

162 

0 

0 

0 

6 

576 325 24 

-1 


509-4 

163 

26-6 


h 

64 



h 

h 

h 

234 

0 

0 


308 

0 

0 

0 

231 

0 -S 66 

0-866 


223 

0-5 

1 

0-5 

197 

0-866 

- 0-866 


317 

0-866 

0 

- 0 - 86 ( 

92 

0 

0 


162 

1 

-1 

1 

T 

370-6 

29-4 


6 

548 

61 

-0-5 


0-618 

0-049 



0-913 

0-102 

- 0-001 
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Explanation, 

% = (2*34xl-f2-77x0-S66 + 0-71x0-54-l-62x0)/6 

= 0-S49, etc. 

Hence, as far as terms in cos 4^, sin 4oB, the regression is 
4‘0'849 cos ^4-0*542 cos 2^4-0-272 cos Z9 
= 0*960 4-0*040 cos 4^ 

4*0*913 sin 04-O-618 sin 2^4-0*102 sin 3^ 

4-0*049 sin 4^, 

and the regressions to fewer or more terms involve the 
same coefficients bj as are given by the above scheme 
of solution. 

The sum of squared residuals may also be calculated 
beforehand from the regression coefficients in a scheme set 
out as follows ; 




na^ and 




n— 2^— 1 

Mal^hl) 

iS2 

! 

1 

d 



24*983 


0 

11 

11*059 

13*924 

1*266 

1 

9 

9*326 

4*598 

0*511 

2 

7 

4*054 

4*544 

0*078 

3 

5 

0*506 

0*038 

0*008 

4 

3 

0*024 

0*014 

0*005 


Just as in pohmomial regression, the contributions to the 
sum of squared residuals produced by successive terms are 
subtracted in turn from which here is 24*983. The estimate 
of variance of a single residual is then made by dividing the 
residual sum of squares by — the number of degrees 

of freedom. The results are shown in the last column. 

67- General Regressions. x4fter what has preceded, 
the routine to be adopted in other regressions, such as 

y = tan 2i94-‘-*+c2fc tan hd . (1) 

will be readily understood. Such regressions are not 
common in statistical work, but they^ are not outside the 
bounds of possibility. The desirable thing in any problem 
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of regression will be to express if possible, in terms of 
fimctions which, like the Tchebychef polynomials or the 
sine and cosine of multiples of have the orthogonal 
property that the product-sums of different functions of 
the set over the range vanish. The effective meaning of 
this is that the contributions of the successive terms to 
the regression are uncorrelated with each other. 

Harmonic Analysis. For fuller details concerning the 
practical routine of estimating periodic regressions, the 
reader may consult the chapters on harmonic analysis in 
Whittaker and Robinson’s Calc'ulus of Observations, or 
Brunt’s Combination of Observations^ 2nd edition, 1931. 



CHAPTER Vlt 


PROBABILITY DISTRIBUTIONS OF STATISTICAL 
COEFFICIENTS 

68. Sampling Distributions. A statistical coefficient 
computed from a sample of n values, univariate or multi- 
variate, is only an estimate of the corresponding parameter 
in the population or underlying probabihty function. It 
is therefore to be presumed erroneous, though the degree 
of error cannot be affirmed exactly, since the true value of 
the parameter is not known. The degree of error can be 
stated only in terms of probability ; and the probability 
distributions involved are (i) the hypothetical population, 
or distribution of the variate or variates, (ii) the derived 
distribution of the coefficient of estimate from sample. 
The second of these is caEed the sampling distribution of 
the coefficient. 

Let us consider a case in which the &st of these 
distributions, the probability distribution of the variate, 
is not hypothetical but given. In Charlier’s experiment 
(22) of drawing 10 cards from a pack, with replacement 
of each card, and continuing this until a sample of 1000 
sets of 10 cards had been collected, the variate was the 
number x of black cards in a set of 10, and its probability 
distribution was the binomial distribution, with mean 5 
and variance 2-5 ; the corresponding values of mean and 
mean square deviation in Charlier’s sample were 4-933 
and 2-415. Are the respective deviations 4-933-5, or 
—0-067, and 2-415-2-5, or —0-085, reasonable or 
abnormal? Such questions can be answered only when 
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the sampling distributions of the estimates of mean and 
variance are known. 

The nature and genesis of these sampling distributions 
can be illustrated from this same example. The sample 
group of 1000 sets of 10 card drawings was merely one 
out of an enormous number of equally possible groups. 
Trom the pack of 52 cards the 10 cards, drawn one at 
a time with replacement, could eventuate, if order of 
drawing were taken into account, in 52^® ways. This is 
an unimaginably large number, but the number of groups 
of 1000 sets which may be chosen from these 52^^ sets 
is incomparably greater still. Each group may be 
supposed to have its mean m and mean square deviation 
5 ^, computable in the usual way. The aggregates of these 
values of m and constitute probability distributions, 
and these are the sampling distributions of m and s^ for 
the kind of sample in question. 

Example. If the parent population is normal and the 
number in sample is the sampling variances of the 
estimates m^, of the moments /xg, ftg, are respectively 
2a^jn, 6or®/n, For jLtj, /x*, ... they increase rapidly. 

The functional form of a sampling distribution depends 
(i) on the population (probability function of the variate 
or variates sampled), (ii) on the function used for estimating 
the parameter, and (iii) on n, the number of observations 
in the sample. Since 1900, and especially since 1915, 
much research has been expended on the problem of 
deriving the probability distributions of the commoner 
coefficients. Most of this research has been devoted to 
samples of a normally distributed variate or variates, and 
the sampling distributions are now well known and already 
classic. It appears that as the number n in sample 
increases the sampling distributions of many coefficients, 
though by no means of ah, tend themselves towards the 
normal type. In such cases it is customary to supply an 
estimate of the precision of a coefficient by appending to 
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its computed value its standard deviation of sampling, or 
standard error ; and tMs is sometimes said to imply a 
probability of 10/20 that the true value lies in the range 
delimited by twice the standard error on either side of 
the computed value. 

The form of statement is not strictly accurate ; if, for 
example, a computed mean is m, and the central 95 per cent, 
range of sampling probability area is the criterion of what is 
acceptable, then m may be anywhere from the extreme left 
of the 95 per cent, range of a sampling distribution centred 
on a hypothetical mean f/ to the extreme right of the 95 per 
cent, range of a second sampling distribution centred on a 
hypothetical mean /x" ; but these are different distributions, 
and it does not follow either that the left half-range /x'— m 
of the first is equal to the right half-range m—yu" of the 
second, or that we can add the probabilities, under the different 
hypotheses, that the true yi. lies in these respective half-ranges. 
In fact, yt. being an uriknown parameter, an ordinary direct 
statement of probability cannot be made. 

When the number in sample n is small the samphng 
distribution of the coefficient is often of non-normal, skew 
or platykurtic type, and the standard error is an insufficient 
indication of the interval within which the true value of 
the parameter may lie. It is necessary in such a case to 
know the sampling distribution and probability integral 
of the special coefficient. 

69. The Sampling Distribution of Means. In a 
few cases the sampling probability function of the mean 
of n observations is of the same type as the probabihty 
function of the population. For example, the normal 
probability function with mean ju and variance cr^, 

1 — 4(a;— /x)“/o-^ 

— 7 =- ( 1 ) 

has m.g.f. exp Hence the m.g.f. of the sum 

of n sample values is exp {nfjLa+lna^a^), To change 
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from sum to mean is to -write xjn for x, or a/?i for a. Hence 
tlie m.g.f. of the mean of sample is exp (jjLa+ia^a^jn), 
The mean of sample is thus distributed normally about 
the same mean as before, but with variance o^jn^ or 
standard error ajV n. 

Ex. 1 . Prove that if Aj, Ag, A 3 , ... are the semiuvariants 
of any population, the seminvariants of the mean of a sample 
of n are A^, 

Ex. 2. The number x of black cards in a set of 10 in 
Charlier’s experiment is binomially distributed with mean 5 
and variance 2*5. The mean of a; in 1000 sets is distributed 
with approximate normality, about mean 5, and with variance 
2*5/1000, or 0-0025. The standard error is thus 0-05. The 
deviation of the mean 4*933 of Charlier’s sample from 5 is 
—0*067, about 4/3 of the standard error. 

The deviation is not excessive- From the table of the 
normal probability integral on p. 144 it is seen that the 
probability of a deviation exceeding l*34or is about 0*18. 

Agairi, if the probability function of x is of Gamma or 
so-called namely 

<l,{x) ^ {r{lc)y^xT^-H . . (2) 

the m.g.f. is 

/•OQ 

(r(it))-H x^-H-^^e^dx^ (3) 

The m.g.f. of the sum of n sample values x^ is (1 
and so the m.g.f. of m, the sample mean, is (1— 
Reverting to the probability function, which by a theorem 
of Lerch is unique, we obtain the probability function 
of m as 

(j}{m) = n{r{nh))-^{mri)^^-'^e-^^, . . (4) 

This is again of Pearson’s Type III. 

Ex. 3. Prove that the distribution of the sxun (not the 
mean) of n values cc,- each obeying the Poissonian law ip{x) of 
33 is Poissonian. (Use the f.m.g,f, of x.) 
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70. Distribution of Mean Square in Normal 
Sample. If the probability differential of x is 

dp = e-i^dx, . . (1) 

^ V-Itt 

that of z = as in 37 (2), is 


dp = z~h~^dz, (2) 

'V'lr 

and so the m.g.f. of z is (1— a)“^ by 69 (3). Hence the 
m.g.f. of half the sum of the squares of n sample values 
is (1— and so if 5 ^ is the mean of the squares the 
m.g.f. of is (l—aln)~^^. It follows that the probability 
differential of u, where is 

dp = . . (3) 

again of type- changing from to 5^ we have 
the probability function of namely 

^(52) . (4) 

In unstandardized units we must write for 5 2 on 
the right of (4), and insert the factor l/cr^. 

The semin variant g.f. of ig 

—\n log (1— 2a/9i) = J9^(2a/?^^-4a 2/272-24-...) 

= a4-2a2/2!724- (0) 

Thus the mean of 52 is 1 and the variance is ^jn ; in 
unstandardized units these are a 2 and 2a^jn, where 02 
is the variance of x. The s.g.f. also shows that as n 
increases the m.g.f. of 52 tends to asymptotic equivalence 
with exp (a4-a2/7i) ; and so the distribution of 52 tends 
to normality. 

Example. The distribution of s® in Charlier’s 1000 sets is 
almost normal ; — 2-5, and computed from the sample 

(using deviations not from Charlier’s mean 7n = 4*933, but 
from /X = 5) is 2*419. The standard error of is a-\/{2ln) 
= 2*5/ V5OO — 0*112, The actual deviation, — O-OSl, is 
nmnerically about three quarters of this. 

I 
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71. DistriJbution of Estimate of Variance. The 
variance or second moment of the population is commonly 
estimated from the sample by taking the part of the 
sum of squared deviations of the sample values from 
the sample mean m. 

Of the n deviations from m only n—1 are independent, 
and this estimate of which we shall call though 
pointing out that it is not the same as in 70, can be 
expressed as a quadratic expression in 1 independent 
values. Thus we have, by 14 (5), 

{xl-jrxl+...+ccl)ln-{x^+X2-+...+xJ^jn^ 

: {n-l}{zl+4+...+zl_.^)/n^ 

-2(Zi22+Zia3 + ...+2„_22„_i)/w^ . . (1) 

where 2 ^ == Z 2 == i and. 

this is but one of many ways in which may be expressed 
in terms of only n~l variables. The Zj here are linearly 
independent, though correlated by possessing the term 
—Xn in common. (See Appendix, 5.) 

This loss of a degree of freedom, for that is what it is, 
complicates the problem of finding the distribution of 5 2 , 
but its m.g.f. can be evaluated as a multiple integral over 
the n sample values, and proves to be (1— 
which differs from that of the 5 ^ in 70 only in the exponent, 
n—1 replacing n. It follows that the distribution of 5 ^ is 
again o^ type, its probabihty function being in fact 

^(s2) = j j (2) 

which should be compared with that of 70 (4). 

This distribution is called Helmert’s distribution, after 
the German astronomer and geodetist F. R. Helmert, who 
published it in 1876. 

By expanding the m.g.f. and noting the coefficient of 
a 2/21 we find that the mean value of 5 2 over all samples 
of n is {9^— l)cr2/w., where 0^2 is the variance of x. This 
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is really tke theorem of mean square residual of 69 (5), 
and it is true not merely for normal but for general popula- 
tions. Because of the factor {n—l)ln the precept is often 
given to estimate variance by dividing the stun of squared 
deviations from m not by n but by n—l. On the other 
hand, the discrepancy in caused by not doing this is 
of order I/ti, whereas the standard error of sampling of S“ 
is of order Thus the error of method is to the 

error of sampling in approximate ratio 1 : \/(2?i), which 
even for n as small as 25 is less than 1/7. To insist on 
the divisor 1 rather than n in large samples may 
therefore seem a little pedantic ; but in small samples 
an appreciable difference is made. One advantage of the 
division by n—1 is this, that with the modified 5^ the 
probability function (2) assumes the form 

which is now of exactly the same form as in 70 (4), with 
72,-1 for n throughout. Thus the loss of a degree of 
freedom is made apparent. In unstandardized units we 
must write s^/o^ for and insert on the right of (3) the 
factor 1/(7 2. 

The m.g.f, of in (3) is [1— 2a/(w~l)]“^^^~‘^\ from 
which it follows, as in 70 (5), that the sampling variance 
of this modified is 2a^/(n— 1), the standard error thus 
being aW2|^/(n—l), 

Example. By considering the coefficients of a®/3! and 
a^/4l in the s.g.f. investigate the skewness and excess of the 
distribution of s^. 

72. “Student’s Ratio ” t and its Distribution. We 
have seen in 69 that the mean m of a sample of n values 
Xj drawn from a normal population of mean p, and variance 

is distributed normally with mean {jl and variance a- In. 
It follows that the standardized deviation (m-~/x)-\AVo‘ 
distributed normally with mean zero and variance 1. 


2~Kw-1) 





132 


STATISTICAL COEFFICIENTS 


Now in practice we do not know and so we cannot 
standardize the scale. All that we know is the estimate 
(taking n—l for divisor) — S{Xj—m)^l{n~l). The 
deviation of the mean of sample from true mean, standard- 
ized by this estimate 5^, is thus {m—fjL)\/n/s == L This 
is “ Student’s Ratio/’ and it is not normally distributed. 

“ Student ” was the pen-name under which W. S. Cosset 
(1876-1937) wrote his statistical papers. He discovered the 
distribution in 1908. 


To simplify the distribution we may place the origin 
of aj at a; == /^, thus putting /z, = 0. Then m^/nls = t. 
Since mVnIs— (mVw/o-)/(«5/<T), and since the distributions 
of mVnjG and are independent of u, we may use 

standard scale with a = 1. 

Tor coTistant we have dm = sdtl^/n; also the 
probability of obtaining the value t is the probability that 
m takes the value stl's/n, and the probability differential 
for this is 

. . ( 1 ) 


This is for constant ; and so the probability differential 
of t is the integral of (1) over all values of s^. Hence, 
multiplying (1) by the probability differential of which 
we already know from 71 (3), and integrating from 0 to oo, 
we have 

dp {t) = c^dtj se ” ^ 

. . ( 2 ) 


where 




. (3) 


the constant Cg being fixed, as always, by the condition 
that the total probability is 1. 


Note, The above derivation is the one usually given, 
but an important remark must be made. The essential step, 
the compounding of the probability differentials of m and 
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presumes the statistical independence of m and This 
independence (Appendix, 6 ) is not evident, nor is it 
capable of quite elementary proof. The reader may assume, 
however, both here and in the case of the difference of means 
of two samples, that the numerator and denominator of t 
are independent. 

The remarkable and important fact about the 
^“distribution is that it does not involve the unknown a^, 
a partial reason being that Hs a ratio, of zero dimension 
in cr^. The discovery of the distribution in 190S had a 
profound influence on “ small sample theory ; for 
whereas it had long been conventional to take s as the 
presumptive o* and to estimate the probable region of the 
unknown ju, by regarding (m— ju)Vn/5 as a standardized 
normal variate, this was now seen to be an inexact 
procedure, and the ^distribution was used instead. 

Since tends with increasing n to 

exp (— i^^), it is apparent that for large samples the 
/{-distribution tends to the standard normal one ; but the 
tendency is not rapid, and for small values of n, as one 
might suspect from noting that n~2 gives the Cauchy 
distribution, the departure from normality is marked, the 
curves being platykurtic. For example, whereas in the 
normal curve 0*95 of the area is contained in the range 
X = —1*96 to re = 1*96, in the ^-curve for w = 10 the same 
area lies between t = —2*26 and ^ = 2*26 ; and for area 
0*99 the ranges are given by x = d^2*5S and ^ = 4^3*25. 

A table of the probability integral of the ^-distribution, 
in a form useful for practical application, is given in R. A. 
Fisher’s Statistical Methods for Research irorA’cr*", Sth edition, 
p. 167. His n is our n — 1, the number of degrees of freedom. 

Example. A coin, thrown 20 times on each of 1») occasions, 
shows 7, 9, 6, 10, 13, 6, 9, 7, 10, 7 heads respectively. 
Assuming the binomial distribution of 20 throws to be 
approximately normal, consider whether the coin is biassed. 

The mean of the heads thro wn is rn = S-4 and — 4-93, 
s = 2*22. Thus, presuming an unbiassed /x = 10, we hax e 
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t = (10— 8*4) V 1^/2* 22 = 2*28. From Fisher’s tables, in the 
row 71 = 9 (our ?i — 1) we find t = 2*262 at P = 0*05. (P is 
the probability of a ^ iiumerically greater than 2* 262.) Thus 
the coin -throws leave it rather doubtful whether the coin is 
biassed or not. 

A reading of tables of the normal probability integral for 
a? = 2*28 would have given P == 0*023, with an unjustifiably 
stronger suggestion of bias in the coin. 

73. Difference of Means of Two Normal Samples. 
A valuable use of the ^-distribution is in testing the 
hypothesis that two samples, with different numbers n 
and N in sample, are from the same normal population, 
of mean /x = 0 and variance o*^. 

Let X 2 , -‘-f sample, Xjy 

be the second, with respective means m and M, and 
estimates of variance 

5 " = X(x^^m)y(n-lh = X{X^^M)^/iN-l), ( 1 ) 

The basis of the test is the difference m^M, the 
variance of which (16) is 

o-^/w+cr^/X = {n-\'N)a'^lnN. . , (2) 

The estimates 5 ^ and of <t^ are (71) of weights n—l 
and X— 1, and so yield a combined estimate of a^, namely 
S2 = [(^.^l)52 + (iV"-l);S2]/(?^+i\^_2) 

= [X(a;,.-m)2+X(X^-if)2]/(7^+X-2). . (3) 

It can be proved that m—M and s^ are statistically 
independent. We therefore define, from (2) and (3), 



and it now follows, by the argument of 72, that this t 
has the i-distribution, but with 2, the number of 

degrees of freedom used in estimating a in place of the 
former n~l. Thus the stables may be consulted for the 
probability P{t) that t numerically exceeds any assigned 
value. (The examples of p. 109 are amenable to ^-tests.) 
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The important point is the way in which is estimated. 
One might have pooled both samples and estimated cr- from 
the squared n-\-N deviations about the pooled mean 
(n7n>-\~NM)l{n-{-N), summed and divided by — 1. This 
slightly more accurate estimate of is, however, not 
independent of 7n~M. 

74. The Ratio of Two Variates of the Same 
Type. The two samples in 73 give in general different 
estimates and of the variance a 2 . If the question 
is whether both samples are from the same normal popula- 
tion, we shall wish to test this by means of and S^, 
without reference to the unknown The analogy of 
t suggests the ratio u = Since u is unaltered when 

we write for and S^ja^ for we may work in 
standard scale, using the 5 ^-distribution of 71, I^et us 
write V = 5 ^/ 0 *^, F = Then u = f/F, or t? = uV. 

By 71 the probability differentials of v and V are 

and . (1) 

For fixed V we have dv = Fdi^ ; so, integrating for all V, 
we have the probability differential of u, 

of fill Y{uVfSn-Z)^-Un-l)uVyl^^ 

Jo ^ 

= • • ■ ( 2 ) 

where c is fixed by making the integral of u unity. 

The distribution of u is thus given by (2). It is 
interesting to verify that as iV->oo the distribution tends 
to the while if — 2 we have a distribution. 

The z-Ristribution of Fisher. R. A. Fisher, in 
testing the difference of tw'O estimates and uses 
not this ratio u but half its natural logarithm. If wo put 

z = i logo'll, u = e“", du = 2e-^dz, 
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the probability differential (2) becomes 

dj:) = . (3) 

where Cg is such that the total integral of z is unity. The 
distribution thus obtained is Fisher’s ;s-distribution. 

Tables of P{z), the probability of a z greater than an 
assigned value, are given in Fisher’s Statistical Methods for 
Research Workers, In these tables the numerator of u is 
the greater of and S^, so that z is positive ; and the 
functions tabled are the values of z for assigned n and N, 
such that P = 0*05, 0*01 and 0*001 respectively. The table 
for P = 0*001 is due to 0. G. Colcord and L. S. Deming. 

75. Analysis of Variance and of Sum of Squares. 

The basic idea of the experimental designs introduced by 
E.. A. Fisher, and of the accompanying technique called 
amlysis of variance, is that of dividing up a total sum of 
squared deviations of a variate from its sample mean into 
several distinct sums of squares, each corresponding to a 
source, real or suspected, of variation. These partial sums 
yield estimates of the variance from each source, and the 
2 :-test is applied to ascertain whether these estimates are 
compatible with each other and with the estimate of residual 
variance. If they are not so compatible, it is presumed 
that the sources have distinct effects, which are further 
analysed, for example by difference (73) of means. 

The resolution into sums of squares is founded on the 
Lemma, noted in 62 in connexion with the correlation 
ratio, that if h sets of %, % observations, with 

respective means and mean square deviations Sf, are 
pooled in an aggregate of n == * • • +% observations, 

with mean M and mean square deviations 8^, then 

n8^ = i:nj{S^-\-cf), . . . ( 1 ) 

i 

where Cj — M —Mj, 

For illustration we shall consider an experiment based 
on repeated trials and designed to ascertain (i) whether h 
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varieties of a cereal are different in crop yield, (ii) whether 
h kinds of fertilizing treatment are different in their effect 
on the crop yield of the h varieties. 

Consider first the case (i), the experiment on varieties 
alone. Suppose each of them planted in h similar plots, 
assigned in random positions in a field, and subjected to 
uniform cultivation. The hh yields where i refers to 
variety, j to plot-number, may be arranged for analysis 
in a rectangular scheme of h rows and k columns, a row 
to each variety. Tor convenience in the algebra let us 
choose the origin of so that the sum or mean of aU 
is zero. 

Now consider the sum SE 2 /|. over all Tih deviations. 

i 3 

Let the means of rows (varieties) be 2 / 10 ? 2/20 •••? Vho- 
Then by (1), remembering that the general mean is zero, 
we have 

ZEy% = SS{yy-y^^^-{-lcSy% . . (2) 

i 5 i j % 

The sums here are sums of squared residuals, and under 
the assumption that all plot-yields have zero mean and 
variance cr^, the mean values or expectations of the terms 
give, by 59 (5), the relation 

(M-l)a2 = . (3) 

where the terms correspond to those in (2). The first 
term on the right follows from the fact that the mean 
value or expectation of sum of the k squared deviations 
for any row {k—l)a^ ; and the second term then follows 
by subtraction. 

The coefficients in (3) are really degrees of freedom ; 
and we thus distinguish hk — 1 degrees of freedom for all 
hk plots, of which h—l are for variation between means 
of rows, that is, betiveen varieties, and are for 

variation about the particular variety means y^Q, that is, 
within varieties. 

If the hypothesis to be tested is that varieties are not 
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essentially different in yield, this is the same as to suppose 
that variation between varieties is subject to the same 
cause as variation within varieties, that is, to ordinary 
randomness arising from soil heterogeneity and other 
causes common to all plots. The test is therefore to com- 
pute an 5 2 from the sum of squares between varieties 
and an from the sum of squares within varieties, these 
being independent estimates of cr^, and to see from the 
2 :-table whether they are compatible. In the calculation 
of 5^ and the respective degrees of freedom should be 
used as divisors ; and is most easily calculated by 
means of 

= . . . (4) 

i j i 

76. Analysis into Two Sources of Variation and 
Residual. Next, still with the same h-hj-h arrangement 
(which in. the random placing of plots in the field is 
called the ‘‘randomized block” arrangement), let the 
rectangle of h rows and h columns of yields be set out 
for analysis in the case when there are not only h different 
varieties, but each is subjected to h different treatments, 
so that is the yield of the variety under the 
treatment. Let the means of columns (treatments) be 

2/oiJ 2/02 j "*5 VoTc' 

Consider the term in 75 (2), and imagine 

i j 

all the deviations from mean of variety, yij—yio> to be 
set out in a rectangle just as the were. Since 

Ey^Q ~ EEyij/k = 0, the means of the yij'—yio in columns 
i i 0 

are merely those of the themselves, namely y^^, y^g, 

•••3 y^ik' 

Hence, by analysing this term exactly as was, 

i j 

but with respect to column means instead of row means, 
we have 

% i i j j 


( 1 ) 
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Hence again, from 75 (2), 

i j i j i j 

which exhibits a threefold dissection, the last two terms 
on the right corresponding to variation between varieties 
and between treatments respectively, and the first term 
to residual variation. As for degrees of freedom, by taking 
expectations as before of these sums of squared residuals, 
we have 

(h-l){k-l)a^+(h-l)cT^+(k-l}cT^ (3) 

the coefficients giving the desired divisors of corresponding 
terms on the right of (2), for estimates of variance. The 
comparison of estimates by the 2 -tahle is then available. 


A 

E 

B 

D 

G 


77. The Latin Square. In the arrangement shown 
on the left, each of A, B, (7, D, E 
B C D E appears exactly five times in rows 

C A B D and columns of a square, but 

DEG A no letter occurs twice in the same 
E B A C row or same column. Such an 

A D E B arrangement of h letters each 

repeated h times is called a Latin 


square of order h. 

Imagine the Latin square to be a scheme of plot-pelds 
set up for analysis, the letters representing yields of 
different varieties, the rows corresponding to varied 
treatments of one kind, the columns to varied treatments 
of another kind ; for example, two kinds of fertilizer 
applied at once, at Ji different levels of strength in each. 
There are thus three dimensions of variation, two for 
treatments and one for variety ; and so the yields may 
be written where i refers to row% j to column, I to 
variety. Let the respective means for rows, column^ aiid 
varieties be yojo and yg^i. Each suffix runs from 1 to Ji. 



140 


STATISTICAL COEFFICIENTS 


A first analysis as in 76 (2) gives us 

ij ij i 3 

But now arrange the %“yioo~%io ^ say, according 
to 2 = 1, 2, ..., h, and analyse once again. Since the 
means of ^/ioo Voj'o zero, the means of y^i—yi^Q 
—Vm simply ?/ooz, where Z = 1, 2, h. We therefore 
have 

^^y\i ~ ~“%oo ~2/ojo ““%oz) ^ (^) 

ii i i i 

where the three last terms on the right are sums of squares 
for variation between rows, columns and varieties respec- 
tively, and the first term is for residual variation. By 
taking the expectations of these sums of squared residuals 
we have 

(A2_l)^2^ (^_l)(^_2)<y2 + (A-l)a2 + (^-l)cT2 + (A-l)cr2, (3) 

which shows the respective degrees of freedom to be used 
as divisors in the estimates of variance. 

Example. The entries in the square below are the numbers 
of successes in 25 sets of 10 drawings with probability p = 0-52, 
written down consecutively in 5 rows. The mean squares of 
the analyses may be compared with the theoretical er^, which 
is 10x0*52x0*48 = 2*50. 

The working details, based on the formulae of 75, 76, 77, 
are shown (i) in ordinary row and column analysis, applicable 
equally to h rows and k columns, (ii) in Latin square analysis, 
using the particular Latin square {q.v») given above. 


(i) 5 

3 

2 

4 

6 

Sums. 

20 

Means. 

4-0 

6 

7 

5 

4 

5 

27 

5-4 

3 

6 

3 

6 

5 

23 

4*6 

8 

3 

7 

6 

4 

28 

5*6 

6 

2 

6 

4 

5 

23 

4*6 

Sums 28 

21 

23 

24 

25 

121 


Means 5*6 

4*2 

4*6 

4*8 

5*0 


4-84 
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Latin sq. 

Sums. 

M^oans. 

A 

23 

4-6 

B 

22 

4-4 

C 

25 

5-0 

D 

29 

5-8 

E 

22 

4-4 

Total 

. 121 

4-84 


Tlie various sums of squares used are: (1) the sum of 
squares of all 25 entries, namely 647 ; (2) the sum of the five 
products, row-sum by row-mean, 594-2 ; (3) the same for 
columns, 591-0; (4) the same for letters in Latin square, 
592-6. Each one of these must be corrected for transference 
to the general mean 4-84, and the correction in every case is 
to subtract the product of total sum by total mean, 121 by 
4-84, or 585-64. 

Thus the corrected sums of squares are (1) 61-30, (2) S-56, 
(3) 5-36, (4) 6-96. The residual siun of squares is found by 
subtraction from the total sum, and the details of estimate of 
mean square are set out in tabular form thus : 


(i) Row and column analysis. 

Sumsq. 

Degx. 

Mean sq. 

Rows 

8-56 

4 

2-14 

Cols. . 

5-36 

4 

1-34 

Res. . 

47*44 

16 

2-97 

Total . 

61-36 

24 

2-56 

(ii) Latin square analysis. 


Sum sq. 

Degr. 

Mean sq. 

Rows 

8-56 

4 

214 

Cols. . 

5-36 

4 

1-34 

Letters 

6-96 

4 

1-74 

Res. . 

40-48 

12 

3-37 

Total 

61-36 

24 

2-56 

We need not continue, 

but in practice the mean squares 

for rows, columns and letters would be compared with each 

other and with the residual mean 

square by taldng half the 
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difference of logaritlims and applying the s-test. (The 
logaritlmis are Napierian, and we may note the relation 
I log^i^ = M51 logio'w.) 

The principle of isolation, by appropriate experimental 
design, of the separate variations due to several simul- 
taneous causes, has been developed and widely applied 
in recent years. Complex patterns, such as randomized 
blocks in which each element is itself a block, or Latin 
squares in which each “ letter ” is a Latin square, have 
been designed and used. The idea is to save time, space 
and expense by being able to conduct several kinds of 
experiment at the same time and within the one frame. 
For further details the reader may consult Fisher’s The 
Design of Experiments, 2nd edition, or Yates’s The Design 
and Analysis of Factorial Experiments (Harpenden, 1937). 


78. Conclusion. The consideration of other sampling 
distributions would exceed our space and scope, but one 
of special interest may be noted. The distribution of r, 
the standardized product - moment estimate (without 
Sheppard’s correction) of p in normal correlation, was 
found by R. A. Fisher in 1915. The probability function 
has the rather complicated form 




^n-2 

d(rp)”“2 


arc cos {—rp)\ 

. V(i-rV=) /(I) 


and the curve, if p is at aU large and the sample small, 
is skew and m eases even U-shaped. (The function and 
its integral have been computed, for 9^ = 3, 4, 25, 50, 

100, 200, 400 and p = 0-1, 0-2, 0*3, 0-9, by F. N. 

David in Tables of the Correlation Coefficient, London, 
1938.) 

It was proved by Fisher {Metron, 1921) that the 
hyperbolic tangent transformation 
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produces a distribution which even for n as small as 20 
is nearly normal, with mean ^ and variance l/(n--3). 

A second transformation of r, namely 


t = 


va- 


Vn-2, 


• ( 3 ) 


leads to a distribution with n — 2 degrees of freedom. 
These transformations are necessary, because of the 
extreme non-normality of the sampling distribution of r, 
which makes the crude use of the standard error of r a 
fallacious procedure. 


79. Estimation of Parameters from Sample. In 

40 and 42 we have estimated the mean ju, of a normal and 
a Poisson distribution by the mean m of the sample, in 
43 we have pointed out the demerits of the mean of sample 
in estimating the true mean of a Cauchy distribution.' 
The general problem of estimation is this : given n sample 
values .^ 1 , ajg, of a variate x with probability function 
€j){x ; 6) involving a parameter what function 
..., Xn) of the sample values shall be used to estimate 6 ? 
The problem must be posed in mathematical terms, and 
must, in order to become intelligible, assume a certain 
degree of arbitrariness. One fruitful principle, well 
justified by its results, consists in choosing T by making 
the compound probability density of x^, x^b, maximum 

with respect to 6. This is R. A. Pisher’s principle of 
maximum lihelihood. Another principle postulates (i) that 
T shall be unbiassed, in the sense that the mean value of 
T over all samples of n data shall be equal to 6, (ii) that of 
aU such functions T shall be the one With minimum sampling 
variance. In many cases these two different approaches 
(the second of which has not yet been deeply explored) 
lead to the same function T of estimate. The situation is 
parallel to that wliich occurs in the theory of Least Squares, 
where, as mentioned at the end of 57, different sets of 
postulates lead to the same normal equations. 
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80. Four-Place Table 


X 

4> 

X 


0-00 

0000 

0-50 

1915 

0-02 

0080 

0-52 

1985 

0-04 

0160 

0-54 

2054 

0-06 

0239 

0*56 

2123 

0-08 

0319 

0*58 

2190 

0*10 

0398 

0-60 

2257 

0-12 

0478 

0-62 

2324 

0-14 

0557 

0-64 

2389 

0*16 

0636 

0-66 

2454 

0*18 

0714 

0*68 

2517 

0-20 

0793 

0*70 

2580 

0-22 

0871 

0*72 

2642 

0-24 

0948 

0*74 

2703 

0*26 

1026 

0*76 

2764 

0*28 

1103 

0*78 

2823 

0-30 

1179 

0*80 

2881 

0-32 

1255 

0*82 

2939 

0-34 

1331 

0*84 

2995 

0-36 

1406 

0*86 

3051 

0*38 

1480 

0*88 

3106 

0-40 

1554 

0*90 

3159 

0-42 

1628 

0-92 

3212 

0-44 

1700 

0-94 

3264 

046 

1772 

0-96 

3315 

048 

1844 

0-98 

3365 

0-50 

1915 

1-00 

3413 


of $(:c) = (2it)-^ ^'dx. 


X 


X 

$ 

1*00 

3413 

1*50 

4332 

1*02 

3461 

1-55 

4394 

1*04 

3508 

1-60 

4452 

1*06 

3554 

1*65 

4505 

1-08 

3599 

1*70 

4554 

MO 

3643 

1*75 

4599 

M2 

3686 

1-80 

4641 

M4 

3729 

1*85 

4678 

M6 

3770 

1*90 

4713 

M8 

3810 

1-95 

4744 

1*20 

3849 

2-00 

4772 

1*22 

3888 

2*10 

4821 

1*24 

3925 

2*20 

4861 

1*26 

3962 

2*30 

4893 

1*28 

3997 

240 

4918 

1*30 

4032 

2*50 

4938 

1*32 

4066 

2-60 

4953 

1*34 

4099 

2*70 

4965 

1*36 

4131 

2*80 

4974 

1*38 

4162 

2'90 

4981 

140 

4192 

3-00 

49865 

1*42 

4222 

3*20 

49931 

144 

4251 

3*40 

49966 

146 

4279 

3-60 

49984 

148 

4306 

3-80 

49993 

1*50 

4332 

4*00 

49997 


A decimal point is understood before each entry <l>(cc) ; 
and second difference interpolation is advisable in the last 
column, 

A useful inverse table of the normal probability integral 
is the table of Probits in Fisher and Yates’s Statistical Tables 
for Biological, Agricultural and Medical Research (OiiYev and 
Boyd, 1938), pp. 38-40. The “ probit ” is the value of x which 
cuts off at its ordinate a given percentage of area measured 
from the left of the normal curve. 
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1. Finite Difierences and Factorial Polynomials. Most 
tables of functions provide us with sequences of values which 
by a suitable choice of origin and scale may be denoted by 
Wo» ^ 1 ,^ 2 , — To these we may apply differencing and repeated 
differencing, analogous in the Calculus of Finite Differences 
to differentiation in the Infinitesimal Calculus (see Whittaker 
and Robinson, Calculus of Observations (Blackie), Chapters I 
to IV). The operations most commonly used are : 

the advancing difference, J?/® 
the receding difference, 

the central difference, 8^/* = Wa;+j— • (1) 

the averaging operation, 

the mean central difference == 

operations which may all be repeated. The classical formula 
of interpolation, which uses ad^^ancing differences derived 
from Wo» ^2 •••? is the Gregory -Newton formula 

Ua = UQ-{-xAuQ+x^^^A^UQl2li-x^^^A^UQ;Bl+,.., * ( 2 ) 

a formula which terminates at n+ 1 terms if is a polynomial 
of degree n, and which in practical cases converges well, 
with negligible remainder, after a few terms. The formula 
is the analogue of the Taylor series in the Infinitesimal Calculus. 

The polynomials 1, x, x^^\ ... which appear in (2) 

are (29) ordinary factorial polynomials 1, .r, x{x — l), 
— l)(a;— 2), .... If they are divided respectively by 0!, 
1!, 2!, 3!, .. we obtain the reduced factorials or binomial 

coefficients 1, z, x^ 2 ), ^z) 

Central factorials may be defined by 

= x, = (.'r + ^)(a;— J), = (a:+l)(^(2; — 1), (3) 

the factors being in arithmetical progression of common 
difference unity and centred at x. The reader may verify tlsat 
X, = a;-, = x(x^—l), ~ i4,; 

145 F 
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Given an odd number of values of w® with central value 
u^, the Newton -Stirling formula of interpolation is useful, 

u, ( 6 ) 

Given an even number of values with two central values 
and wj, the Newton -Bessel formula is the appropriate one, 

= fiUQ-\-^.SuQ -f . . . . (6) 

These formulae use central and mean central factorials, and 
mean central and central differences, alternately. 

The origin of interpolation x — 0 can almost always be 
chosen so that rr, in the interpoland Ua,, need not exceed J. 
The following relations are fundamental : 

equivalent to 8a;W — 

(Of. Dx^ «= ras*’"^ in the Differential Calculus.) 

2. Finite Sums. The following table of repeated 
summation upon Uq, exemplified for n ^ 

follows the scheme proposed in 19 for computing factorial 
moments. 


u 2 


2^ 2^ 2^ 2^ 


Uq Wo+%+«^S+W8 + W4 
Wi 

% Ws+Wg-I-Wi 

Ws W3+W4 


% + 2 -f- S'Wg -f- 4^4 

^2 +2^3 4-3?/4 ^2 4” 3'*^ 3 +6*^4 

Wj 4- 4- 3?/4 «8 4- 4^4 

W4 t^4 t^4 


W4 


Scrutiny will show that the entries at the tops of the 
successive columns of summation are the reduced factorial 
moments : 


2Uygy 2XUxy ^^{%)'^y 2X^^)Uxy 2X^^yUffy .... 

This may be proved by an induction based on 

With a little more difficulty, using central factorials, it 
may be proved that the scheme of repeated summation 
toward the centre with alternate averaging, used in Ex. 3 
of 19, produces reduced central and mean central factorial 
moments i7a;Mwa./r! and 2y^^)uxlr\^ 



APPENDIX 


147 


3. Helatious between Powers and Factorials. We have 


(i) cc =x, (ii) 


X ~ X 
a;2 = ^{2}^ 

a4 = 


as may be verified by actual expansion. Multiplying any of 
these relations by and summing over equally spaced values 
of X, (i) with re = 0 as least value, (ii) with a; = 0 as middle 
value, we derive the relations quoted and used in 19, Exs. 1 
and 3, for converting factorial moments, or central and mean 
central factorial moments, into ordinary moments, 

4. Tables of Normal Probability Integral and Poisson 
Function. A very convenient table of the normal probability 
integral in standard scale, to four places of decimals, is given 
in Bowley’s Elements of Statistics, p. 271. The table is 
accurate enough for most practical purposes, and may be 
interpolated by proportional parts, that is, only using first 
differences. We give a compact table in 80, p. 144. 

In the Poisson function the chief requirement is the value 
of If a machine is available, the following short tal^ 

enables to be computed with sufficient accuracy for 
m = 0 to 10. 


m 


m 

Q—m 

m 


m 

Q—m 

1 

0*36788 

0*1 

0*90484 

0*01 

0*99005 

0*001 

0*99900 

2 

0*13534 

0*2 

0*81873 

0*02 

0*98020 

0*002 

0*99800 

3 

0*049787 

0*3 

0*74082 

0*03 

0*97045 

0*003 

0*99700 

4 

0*018316 

0*4 

0*67032 

0*04 

0*96079 

0*004 

0*99601 

5 

0*0067379 

0*5 

0*60653 

0*05 

0*95123 

0*005 

0*99501 

6 

0*0024788 

0*6 

0*54881 

0*06 

0*94176 

0*006 

0*99402 

7 

0*0009119 

0*7 

0*49659 

0*07 

0*93239 

0*007 

0*99302 

8 

0*0003355 

0*8 

0*44933 

0*08 

0*92312 

0*008 

0*99203 

9 

0*0001234 

0*9 

0*40657 

0*09 

0*91393 

0*009 

0*99104 

10 

0*0000454 

1*0 

0*36788 

0*10 

0*90484 

0*010 

0*99005 


For smaller values of m than those given above the approxi- 
mation 1— m for is correct to at least five decimals. 


Ex. In the example of 42 we have m = 3 *870. Entering 
the above table at w = 3, 0-8, 0*07, we form the product, 
thus : 0*049787 X 0*44933 X 0*93239 = 0*20838. 

k2 
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5. Linear . Dependence, Functional Dependence, Cor- 
relation, Statistical Dependence. These are concepts 
which need careful discrimination. The functions %(a;), 
where y = 1, 2, ri, are linearly dependent if a relation 

+C2'Z^ 2 = 0, . . . (1) 

exists identically in jt, where one at least of the Cj is not 
zero. They are functionally dependent if a functional relation 

F{u^, = 0 . . . ( 2 ) 

exists identically in x. Linear dependence, for ' example, is 
the case -^ here is a non -zero linear function. They are 
uncorrelaied it the product moment vanishes for each pair 
Ui and % of the set. 

Correlation and functional dependence are (48) not 
necessarily the same. The simplest example is perhaps 
u = a cos x-\-h sin x, v — a sin x—h cos x. Here u and v are 
uncorrelated, yet are dependent in view of the quadratic 
relation — 0. 

To describe statistical dependence^ we may say that 
statistical independence is really obedience to the multiplica- 
tion theorem of probability. Suppose we have two functions 
of n variates, u(x, y, z) and v{x, y^ z), where we illustrate by 
n == 3. They have each a probability, or probability density, 
let us say ^i(w) and ipiiv), depending on the distribution of 
X, y and 2 . They have also a compound probability, or 
probability density, let us say v). If for all the possible 
values of x, y, z we have v) = then we say 

that u and v are statistically independent. 

An equivalent formulation is by generating functions. If 

(?(a, jS) = y, z)dxdydz (3) 

where ^(a?, z) is the compound probability density of £C, y, z, 
and if ^(a, jS) = a(a, 0)G{0, jS), . . . (4) 

all integrals existing in some common domain of a, p, then 
u and V are statistically independent. By this criterion it 
may be proved that the estimates m of [jl and of in a 
normal sample (72) of w- values cc^are statistically independent, 
so that the derivation of the ^-distribution (loc. cit.) is valid. 
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Beta function, 71, 72 
Binomial distribution, 49, 58, 
125, 133 

approximations to, 59-61, 63 
of Poisson, 50, 51, 58 
Binomial correlation, 82, 83 
Bivariate distribution, 80, 86, 
94, 95 

generating function, S3, 84 
Blocks, randomized, 138, 140, 
141, 142 

Central factorials, 145, 146 
Central factorial moments, 43, 
44, 146, 147 
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Change of variate, 69, 135, 136, 
142 
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Coefneient of correlation, 86*87, 
90-93, 113-114 
of perturbation, 55 
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Complementary event, 13 
probability, 13 

Compound probability, 14, 15 
Computation of moments, 39-43 
Contingencv table, 82, 83, 99, 
100, 102, 104, 105 
Corrections, Sheppard’s, 39, 44- 
47, 73, 76, 94, 142 
Correlation, 80-103, 111, 112, 
113, 114, 148 
binomial, 82, 83 
coefficient of, 86, 87, 90-93, 
112-114 

bypergeometric, 84 
non-linear, 95-98 
non-metrical, 99-105 
partial, 113, 114 
Poissonian, 94, 95 
ratio, 95-99 
surface, 82, 88 
table, 89-93 
total, 112, 113, 114 
Covariance, 84 

Criteria of homogeneity, 54, 55 
Cumuiant, 22, 32 

Degrees of freedom, 102, 103, 
123, 130, 131, 133, 134, 
137-140, 143 

Density, probability, 16, 143 
Dependence, linear, 101, 102, 
103, 130, 148 
functional, 88, 148 
statistical, 13, 14, 15, 87, 88, 
102, 130, 148 

Dependent events, 15, 16, 56 
Deviation, mean absolute, 32 
standard, 35, 37 
Difference of means, 134 
Differences, finite, 59, 67, 115, 
118, 119, 145, 146 
Dispersion, 32, 34, 35 
residual, 96 
Distribution' — 

binomial, 49, 58, 125, 133 
binomial of Poisson, 50, 51, 58 
bivariate, 80, 86, 94, 95 
Coolidge, 53-55 
frequency, 26 

Gamma type, 69, 72, 102, 128 
Helmert’s, 130 


Distribution — 
hypergeometric, 56, 57 
J-shaped, 27, 64 
leptokurtie, 38 
Lexian, 53, 54, 55, 72 
multinomial, 55, 101 
multivariate, 80, 101 
normal, 58-62, 73, 74 
normal correlated, 86-89, 101, 
113 

of Fisher’s 2 , 135, 136, 138 
of r, 92, 142, 143 
of Student’s t, 131-134, 143, 
148 

of sum of squares, 69, 129 

of 101, 102 

of variance estimate, 130- 
131 

Pearsonian system, 67-71 
platykurtic, 38, 71, 127, 133 
Poisson, 58, 59, 63, 64, 66, 
77, 78, 103 

Poisson correlated, 94, 95 
probability, 26 
rectangular, 48, 79 
sampling, 92, 125-127, 133 
skew, 27, 31, 36, 37, 58, 69, 
72, 127, 131 
symmetrical, 27, 61 
trivariate, 80, 112, 113 
Type A, 58, 59, 64, 65, 66, 
67, 73, 75, 76 

Type B, 58, 59, 66, 67, 73, 
76, 77, 78 
Type I, 71, 72 

Type III, 69, 72, 102, 128, 129, 
130, 131 

TJ-shaped, 27, 69, 142 
Dot diagram, 80 

Ellipse, probable, 88 
Empirical formula for P(x^), 
104 

Equal lik-eliness, 10, 11 
Equations, normal, 110, 114, 
115, 116, 121 

Error function, 62, 73, 74, 76, 
144, 147 

Error of mean, 128 
of moments, 39, 126 
of r, 92, 143 
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Error of sampling, 39, 92, 126- 
129, 131, 143 

of variance estimate, 129-131 
probable, 35 

standard, 39, 92, 126-129, 131, 
143 

Errors and residuals, 107, lOS 
Estimation from sample, 78, 79, 
143 

Euler -Maclaurin formula, 44 
Events, 5, 6 

dependent, 13, 15 
independent, 13, 14, 15, 18 
mutually exclusive, 13, 18 
Excess, 38, 63, 131 
Expectation, mathematical, 21 

Factorial moments, 21, 22, 41, 
49, 63, 84, 85, 146, 147 
moment generating function, 
22, 49, 63, 84, 85 
polynomials, 21, 145, 146 
seminvariants, 23, 64 
seminvariant generating func- 
tion, 64 

Factorials and powers, 147 
Factorials, central, 145, 146 
Finite differences, 59, 67, 115, 
118, 119, 145, 146 
sums, 40-43, 146 
Fitting of harmonic function, 
120-123 

of polynomial, 114-120 
of probability curves, 73-78, 
143 

Formulae of interpolation, 145, 
146 

Fourfold table, 82, 85, 94, 95 
Fourier, transform, 22 
Frequency, marginal, 82, 100, 
102 

relative, 4, 5, 7, 26, 60 
Frequency polygon, 28 
Function, probability, see Dis- 
tribution 

Functional dependence, 88, 148 

Gamma Type, see Distribution 
Generating function, 15, 16, 17, 
19, 148 

bivariate, 83, 84 


Generating function, change of 
origin and scale in, 23 
factorial moment, 22, 49, 63, 
84, 85 

factorial seminvariant, 64 
moment, 20-24, 60, 65, 69, 
75, 84, 85, 86, 101, 113 
multiplication theorem, 19, 22 
seminvariant, 22, 64, 65, 70 
Goodness of fit, 76, 78, 100-103, 
104 

Gregory -Newton formula, 145 

Harmonic regression, 81, 120-123 
Helmert’s distribution, 130 
Histogram, 28 

Homogeneity, criteria of, 54, 55 
Hypergeometric correlation, 84 
distribution, 56, 57 

Independence, functional, 88, 
148 

linear, 101, 148 
statistical, 87, 148 
Independent events, 13, 14, 15, 
18 

frequencies, 102, 103 
Inductive synthesis, 3 
Integral, probability, 62, 72, 73, 
133, 144, 147 

Interpolation formxilse, 145, 146 

J-shaped curve, 27, 64 

Kurtosis, 38, 63, 131, 133 

Latin square, 139-142 
Least squares, 95, 106-108, 110, 
111 ’ 

Leptokurtic, 38 
Lexian ratio, 55 
Lexian variance, 53, 54, 72 
Likelihood, maximum, 143 
Likeliness, equal, 10, 11 
Limit of relative trequeney, 7, 8 
Limits of r and p, 87 
Linear dependence, 101, 102, 

103, 130, 148 

Linear function, momenta of, 
36, 37 
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Linear regression, 88, 89, 106, 
112 

Logic, algebra of, 5 ^ 

Maximum likelihood, 143 
Mean absolute deviation, 32 
Mean central factorial, 145 
Mean, median and mode, 30, 31, 
37 

Mean square contingency, 104- 
105 

Mean square, distribution of, 129 
Measure of aggregate, 10 
Median, 30, 32-34 
Moments, 20, 31, 37, 38 
computation of, 39-43 
see factorial moments. Gener- 
ating fimctions 

Minimum variance, principle of, 
143 

Multinomial distribution, 65, 101 
Multiplication theorem, 14, 15, 
19 

Mutually exclusive events, 13, 18 

ISTon-linear regression, 95, 114 
Non-metrical correlation, 99-105 
JJTormal curve, see Distribution 
Normal equations, 110, 114-116, 
121 

Optimal values, 108, 109 
Origin, change of, 23, 29 
Orthogonal functions, 116, 120, 
124 

polynomials, 115, 116, 120, 
124 

Parameters, estimate of, 78, 79, 
143 

Partial correlation, 113, 114 
Pearson curves, 67-71 
Pearsonian coefficient r, 86, 87, 
90-93 

Periodic regression, 81, 120-123 
Perturbation, coefficient of, 55 
Phase aggregate, 10, 12, 14 
Platykurtic, 38, 71, 127, 133 
Poisson binomial, see Distribu- 
tion 


Polynomial, factorial, 21, 145, 

146 

Polynomial regression, 81, 114, 
116-120 

Population, 24, 25 
Powers and factorials, 147 
Precision, 107, 108 
Preparation of normal equations, 

no 

Prismogram, 81 
Probability, 4-12 
d priori, 6, 9 

as limit of relative frequency, 
7, 8 

as measure of sub -aggregate, 

10, 12 

complementary, 13 
continuous, 12 
curve, 27 
definition, 6, 9, 12 
density, 16 

distribution, see Distribution 
fxmction, 16 

fundamental theorems, 13, 14, 
15 

integral, 62, 72, 73, 133, 144, 

147 

marginal, 82, 100, 102 
of dependent events, 15 
parameters, 29, 30 
polygon, 28 
total, 13, 82 
Probable error, 35 
Product-moment, 84, 86, 87, 90 
Provisional mean, 39 

Quartiles, 34, 35, 37 

Randomized blocks, 138, 140, 
142 

Randomness, 9 
Range, 35, 36 
Ratio, correlation, 95-99 
Lexian, 55 
of X® variates, 135 
Student’s, 131, 134, 143, 148 
Rectangular distribution, 48, 70 
Regression, 80-82, 88, 89, 95, 
106, 112-124 
coefficients, 112 
lines and planes, 88-89, 106,112 
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Relative frequency, 4, 5, 7, 26, 
60 

Replacement, sampling without, 
56 

Residual, 108 
dispersion, 96 
variance, 109, 119, 123 

Sample, 24, 25, 86, 107, 125-135 
estimation from, 78, 79, 143 
Sampling distribution, 92, 125- 
135, 143, 148 

error, 39, 92, 125-135, 143 
of r, 92, 143 

Sampling without replacement, 
56 

Science, pure and applied, 2, 3 
Seminvariants, 22, 23, 61, 64, 
65, 70, 128 
factorial, 23, 64 
generating function, 22, 64, 
65, 70 

Semi-interquartile range, 35 
Series of Type A, B, .see Dis- 
tribution 

Sheppard’s corrections, 39, 44- 
47, 73, 76, 94, 142 
Skewness, 27, 31, 36, 37, 58, 
69, 72, 127, 131 
Square, Latin, 139-142 
Standard deviation, 35, 37 
error, see Error of sampling 
Statistical dependence, 143 
Statistics, definition, 1, 5, 7, 12 
Student’s see Distribution 
Sum of squares, analysis of, 54, 
136-140 

distribution of, 69, 129 
Stnpmation method for moments, 
40-43, 146 
Symmetry, 27 
Synthesis, inductive, 3 


Tables, British Association, 75 
contingency, 82, 83, 99, 100, 
102, 104, 105 
correlation, 89-93 
fourfold, 82, 85, 94, 95 
of Fisher’s 2 , 136 
of Poisson function, 147 
of P(x'), 103, 105 
of probability integral, 73, 
144 147 

of Student’s t, 133, 134 
of terms in Type A, 75 
Tabulation, 1, 2 

Tchebychef polynomials, 115, 
117, 119, 121, 124 
Transform, Fourier, 22 
Trivariate problem, 80, 113, 114 


Universal, universe, 24 
U-shaped curve, 27, 69, 142 

Variance, 35 
analysis of, 54, 136-140 
Bernoullian, Poissonian and 
Lexian, 51-55, 72 
distribution of estimate of, 
130-131 

minimum, principle of, 143 
of linear function, 36 
of optimal value, 109 
of residuals, 109, 119, 123 
Variate, 16 
additive, 19, 64 
change of, 69, 135, 136 


Weight, 107, 108 
of arithmetic mean, 109 
Weighted mean, 109 

2 -distribution, 135, 136 
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